Affine coding with vector clipping

ABSTRACT

Systems and techniques for video coding and compression are described herein. Some examples include affine coding modes for video coding and compression. One example is an apparatus for coding video data that includes a memory and a processor or processors coupled to the memory. The processor(s) are configured to obtain a current coding block from the video data, determine control data for the current coding block, and determine one or more affine motion vector clipping parameters from the control data. The processor(s) are further configured to select a sample of the current coding block, determine an affine motion vector for the sample of the current coding block, and clip the affine motion vector using the one or more affine motion vector clipping parameters to generate a clipped affine motion vector.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application No.62/907,664, filed Sep. 29, 2019 and U.S. Provisional Application No.62/910,384 filed Oct. 3, 2019, which are hereby incorporated byreference in their entirety and for all purposes.

TECHNICAL FIELD

This application is related to video coding and compression. Morespecifically, this application relates to affine coding modes for videocoding and compression.

BACKGROUND

Many devices and systems allow video data to be processed and output forconsumption. Digital video data generally includes large amounts of datato meet the demands of video consumers and providers. For example,consumers of video data desire video of high quality, fidelity,resolution, frame rates, and the like. As a result, the large amount ofvideo data that is required to meet these demands places a burden oncommunication networks and devices that process and store the videodata.

Various video coding techniques may be used to compress video data.Video coding techniques can be performed according to one or more videocoding standards. For example, video coding standards includehigh-efficiency video coding (HEVC), advanced video coding (AVC), movingpicture experts group (MPEG) 2 part 2 coding, VP9, Alliance of OpenMedia (AOMedia) Video 1 (AV1), Essential Video Coding (EVC), or thelike. Video coding generally utilizes prediction methods (e.g.,inter-prediction, intra-prediction, or the like) that take advantage ofredundancy present in video images or sequences. An important goal ofvideo coding techniques is to compress video data into a form that usesa lower bit rate, while avoiding or minimizing degradations to videoquality. With ever-evolving video services becoming available, encodingtechniques with improved coding accuracy or efficiency are needed.

SUMMARY

Systems and methods are described herein for improved video processing.In some examples, video coding techniques are described that use anaffine coding mode to encode and decode video data efficiently.

In one illustrative example, a method of coding video data is described.The method comprises: obtaining a current coding block from the videodata; determining control data for the current coding block; determiningone or more affine motion vector clipping parameters from the controldata; selecting a sample of the current coding block; determining anaffine motion vector for the sample of the current coding block; andclipping the affine motion vector using the one or more affine motionvector clipping parameters to generate a clipped affine motion vector.

In another illustrative example, a non-transitory computer-readablestorage medium is described. The non-transitory computer-readable mediumcomprises instructions which, when executed by one or more processors,cause the one or more processors to: obtain a current coding block fromvideo data; determine control data for the current coding block;determine one or more affine motion vector clipping parameters from thecontrol data; select a sample of the current coding block; determine anaffine motion vector for the sample of the current coding block; andclip the affine motion vector using the one or more affine motion vectorclipping parameters to generate a clipped affine motion vector.

In another illustrative example, another apparatus for coding video datais described. The apparatus comprises: means for obtaining a currentcoding block from the video data; means for determining control data forthe current coding block; means for determining one or more affinemotion vector clipping parameters from the control data; means forselecting a sample of the current coding block; means for determining anaffine motion vector for the sample of the current coding block; andmeans for clipping the affine motion vector using the one or more affinemotion vector clipping parameters to generate a clipped affine motionvector.

In a further illustrative example, an apparatus for coding video data isdescribed. The apparatus comprises: memory; and one or more processorscoupled to the memory, the one or more processors being configured to:obtain a current coding block from the video data; determine controldata for the current coding block; determine one or more affine motionvector clipping parameters from the control data; select a sample of thecurrent coding block; determine an affine motion vector for the sampleof the current coding block; and clip the affine motion vector using theone or more affine motion vector clipping parameters to generate aclipped affine motion vector.

In some aspects, the control data comprises: a location with associatedhorizontal coordinate and associated vertical coordinate in full-sampleunits; a width variable specifying a width of the current coding block;a height variable specifying a height of the current coding block; ahorizontal change of motion vector; a vertical change of motion vector;and a base scaled motion vector. In some examples, the control data canfurther include a height of a picture associated with the current codingblock in samples and a width of the picture in samples.

In some aspects, the one or more affine motion vector clippingparameters comprise: a horizontal maximum variable; a horizontal minimumvariable; a vertical maximum variable; and a vertical minimum variable.In some aspects, the horizontal minimum variable is defined by a maximumvalue selected from a horizontal minimum picture value and a horizontalminimum motion vector value.

In some aspects, the horizontal minimum picture value is determined fromthe associated horizontal coordinate. In some aspects, the horizontalminimum motion vector value is determined from a center motion vectorvalue, an array of values based on a resolution value associated withthe video data or a block area size (e.g., a current coding blockwidth×height), and the width variable specifying the width of thecurrent coding block. In some aspects, the center motion vector value isdetermined from the base scaled motion vector, the horizontal change ofmotion vector, the width variable, and the height variable. In someaspects, the base scaled motion vector corresponds to a top left cornerof the current coding block and is determined from control point motionvector values. In some aspects, the vertical maximum variable is definedby a minimum value selected from a vertical maximum picture value and avertical maximum motion vector value.

In some aspects, the vertical maximum picture value is determined fromthe height of the picture, the associated vertical coordinate, and theheight variable. In some aspects, the vertical maximum motion vectorvalue is determined from a center motion vector value, an array ofvalues based on a resolution value associated with the video data or ablock area size (e.g., a current coding block width×height), and theheight variable specifying the width of the current coding block.

In some aspects, examples sequentially obtain a plurality of currentcoding blocks from the video data; determine a set of affine motionvector clipping parameters on a per coding block basis for blocks of theplurality of current coding blocks; and fetch portions of acorresponding reference pictures using the set of affine motion vectorclipping parameters on the per block basis for the plurality of currentcoding blocks.

In some aspects, examples identify a reference picture associated withthe current coding block; and store a portion of the reference picturedefined by the one or more affine motion vector clipping parameters. Insome aspects, examples process the current coding block using referencepicture data from a reference picture indicated by the clipped affinemotion vector.

In some aspects, the affine motion vector for the sample of the currentcoding block is determined according to a first base scaled motionvector value, a first horizontal change of motion vector value, a firstvertical change of motion vector value, a second base scaled motionvector value, a second horizontal change of motion vector value, asecond vertical change of motion vector value, a horizontal coordinateof the sample, and a vertical coordinate of the sample. In some suchaspects, the control data comprises values from a derivation table.

In some aspects, the apparatuses described above can include a mobiledevice with a camera for capturing one or more pictures. In someaspects, the apparatuses described above can include a display fordisplaying one or more pictures. The summary is not intended to identifykey or essential features of the claimed subject matter, nor is itintended to be used in isolation to determine the scope of the claimedsubject matter. The subject matter should be understood by reference toappropriate portions of the entire specification of the patent, any orall drawings, and each claim.

The foregoing, together with other features and embodiments, will becomemore apparent upon referring to the following specification, claims, andaccompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

Examples of various implementations are described in detail below withreference to the following figures:

FIG. 1 is a block diagram illustrating an encoding device and a decodingdevice, in accordance with some examples;

FIG. 2A is a conceptual diagram illustrating spatial neighboring motionvector candidates for a merge mode, in accordance with some examples;

FIG. 2B is a conceptual diagram illustrating spatial neighboring motionvector candidates for an advanced motion vector prediction (AMVP) mode,in accordance with some examples;

FIG. 3A is a conceptual diagram illustrating a temporal motion vectorpredictor (TMVP) candidate, in accordance with some examples;

FIG. 3B is a conceptual diagram illustrating motion vector scaling, inaccordance with some examples;

FIG. 4 is a diagram illustrating a history-based motion vector predictor(HMVP) table, in accordance with some examples;

FIG. 5 is a diagram illustrating fetching of non-adjacent spatial mergecandidates, in accordance with some examples;

FIG. 6A is a diagram illustrating spatial and temporal locationsutilized in MVP prediction, in accordance with some examples;

FIG. 6B is a diagram illustrating aspects of spatial and temporallocations utilized in MVP prediction, in accordance with some examples;

FIG. 6C is a diagram illustrating visiting order for a Spatial-MVP(S-MVP), in accordance with some examples;

FIG. 6D is a diagram illustrating a spatially inverted patternalternative, in accordance with some examples;

FIG. 7 is a diagram illustrating a simplified affine motion model for acurrent block, in accordance with some examples;

FIG. 8 is a diagram illustrating a motion vector field of sub-blocks ofa block, in accordance with some examples;

FIG. 9 is a diagram illustrating motion vector prediction in an affineinter (AF_INTER) mode, in accordance with some examples;

FIG. 10A and FIG. 10B are diagrams illustrating a motion vectorprediction in an affine merge (AF_MERGE) mode, in accordance with someexamples;

FIG. 11 is a diagram illustrating an affine motion model for a currentblock, in accordance with some examples;

FIG. 12 is a diagram illustrating another affine motion model for acurrent block, in accordance with some examples;

FIG. 13 is a diagram illustrating a current block and a candidate block,in accordance with some examples;

FIG. 14 is a diagram illustrating a current block, control points of thecurrent block, and candidate blocks, in accordance with some examples;

FIG. 15 is a diagram illustrating an affine model in MPEG5 EVC andspatial neighborhood, in accordance with some examples;

FIG. 16 is a diagram illustrating aspects of an affine model and spatialneighborhood, in accordance with some examples;

FIG. 17 is a diagram illustrating aspects of an affine model and spatialneighborhood, in accordance with some examples;

FIG. 18A is a diagram illustrating aspects of clipping using thresholds,in accordance with some examples;

FIG. 18B is a diagram illustrating aspects of clipping using thresholds,in accordance with some examples;

FIG. 18C is a diagram illustrating aspects of clipping using thresholds,in accordance with some examples;

FIG. 19 is a flowchart illustrating a process of coding with an affinemode in accordance with examples described herein;

FIG. 20 is a block diagram illustrating a video encoding device, inaccordance with some examples; and

FIG. 21 is a block diagram illustrating a video decoding device, inaccordance with some examples.

DETAILED DESCRIPTION

Certain aspects and embodiments of the disclosure are provided below.Some of these aspects and embodiments may be applied independently andsome of them may be applied in combination as would be apparent to thoseof skill in the art. In the following description, for the purposes ofexplanation, specific details are set forth in order to provide athorough understanding of embodiments of the application. However, itwill be apparent that various embodiments may be practiced without thesespecific details. The figures and description are not intended to berestrictive.

The ensuing description provides examples embodiments only, and is notintended to limit the scope, applicability, or configuration of thedisclosure. Rather, the ensuing description of the exemplary embodimentswill provide those skilled in the art with an enabling description forimplementing an exemplary embodiment. It should be understood thatvarious changes may be made in the function and arrangement of elementswithout departing from the spirit and scope of the application as setforth in the appended claims.

As stated above, examples are described herein for improved videoprocessing. In some examples, video coding techniques are described thatuse an affine coding mode to encode and decode video data efficiently.Affine models are models that can be used to approximate flow patternsassociated with certain types of image motion in video, particularlyflow patterns associated with camera motion (e.g., motion of the pointof view or capturing position for a video stream). Video processingsystems can include an affine coding mode that is configured to codevideo using affine motion models. Additional details of affine modes forvideo coding are described below. Examples described herein includeoperations and structures that improve the operations of video codingdevices by improving memory bandwidth use in an affine coding mode. Insome examples, the memory bandwidth improvements are generated byclipping motion vectors used by an affine coding mode, which can reducethe data used in a local buffer by limiting the possible reference area(e.g., and the associated data) used for affine coding.

Some systems use per-sample motion vector generation which can greatlyincrease the number of memory access operations used to fetch filtersamples for affine coding. A large number of fetching operations can behandled by a system if the local buffer is able to accommodate thereference data, but if the reference data for each fetch is large (e.g.,exceeds a local buffer size, such as a size for a decoded picturebuffer), the memory bandwidth usage can degrade system performance. Bylimiting memory bandwidth usage associated with reference pictureaccess, large numbers of fetching operations can be used withoutdegraded memory bandwidth performance, thereby improving deviceoperations. Examples described herein can provide such benefits withinthe context of a larger video coding system and as part of video codingdevices.

Video coding devices implement video compression techniques to encodeand decode video data efficiently. Video compression techniques mayinclude applying different prediction modes, including spatialprediction (e.g., intra-frame prediction or intra-prediction), temporalprediction (e.g., inter-frame prediction or inter-prediction),inter-layer prediction (across different layers of video data, and/orother prediction techniques to reduce or remove redundancy inherent invideo sequences. A video encoder can partition each picture of anoriginal video sequence into rectangular regions referred to as videoblocks or coding units (described in greater detail below). These videoblocks may be encoded using a particular prediction mode.

Video blocks may be divided in one or more ways into one or more groupsof smaller blocks. Blocks can include coding tree blocks, predictionblocks, transform blocks, and/or other suitable blocks. Referencesgenerally to a “block,” unless otherwise specified, may refer to suchvideo blocks (e.g., coding tree blocks, coding blocks, predictionblocks, transform blocks, or other appropriate blocks or sub-blocks, aswould be understood by one of ordinary skill). Further, each of theseblocks may also interchangeably be referred to herein as “units” (e.g.,coding tree unit (CTU), coding unit, prediction unit (PU), transformunit (TU), or the like). In some cases, a unit may indicate a codinglogical unit that is encoded in a bitstream, while a block may indicatea portion of video frame buffer a process is target to.

For inter-prediction modes, a video encoder can search for a blocksimilar to the block being encoded in a frame (or picture) located inanother temporal location, referred to as a reference frame or areference picture. The video encoder may restrict the search to acertain spatial displacement from the block to be encoded. A best matchmay be located using a two-dimensional (2D) motion vector that includesa horizontal displacement component and a vertical displacementcomponent. For intra-prediction modes, a video encoder may form thepredicted block using spatial prediction techniques based on data frompreviously encoded neighboring blocks within the same picture.

The video encoder may determine a prediction error. For example, theprediction can be determined as the difference between the pixel valuesin the block being encoded and the predicted block. The prediction errorcan also be referred to as the residual. The video encoder may alsoapply a transform to the prediction error using transform coding (e.g.,using a form of a discrete cosine transform (DCT), a form of a discretesine transform (DST), or other suitable transform) to generate transformcoefficients. After transformation, the video encoder may quantize thetransform coefficients. The quantized transform coefficients and motionvectors may be represented using syntax elements, and, along withcontrol information, form a coded representation of a video sequence. Insome instances, the video encoder may entropy code syntax elements,thereby further reducing the number of bits needed for theirrepresentation.

A video decoder may, using the syntax elements and control informationdiscussed above, construct predictive data (e.g., a predictive block)for decoding a current frame. For example, the video decoder may add thepredicted block and the compressed prediction error. The video decodermay determine the compressed prediction error by weighting the transformbasis functions using the quantized coefficients. The difference betweenthe reconstructed frame and the original frame is called reconstructionerror.

As described in more detail below, systems, apparatuses, methods (alsoreferred to as processes), and computer-readable media (collectivelyreferred to as “systems and techniques”) are described herein forproviding improvements to history-based motion vector prediction. Thesystems and techniques described herein can be applied to one or more ofa variety of block based video coding techniques in which video isreconstructed on block-by-block basis. For example, the systems andtechniques described herein can be applied to any of the existing videocodecs (e.g., High Efficiency Video Coding (HEVC), Advanced Video Coding(AVC), or other suitable existing video codec), and/or can be anefficient coding tool for any video coding standards being developedand/or future video coding standards, such as, for example, VersatileVideo Coding (VVC), the joint exploration model (JEM), VP9, AV1,Essential Video Coding (EVC), and/or other video coding standard indevelopment or to be developed.

Various aspects of the systems and techniques described herein will bediscussed herein with respect to the figures. FIG. 1 is a block diagramillustrating an example of a system 100 including an encoding device 104and a decoding device 112 that can operate in an affine coding mode inaccordance with examples described herein. The encoding device 104 maybe part of a source device, and the decoding device 112 may be part of areceiving device (also referred to as a client device). The sourcedevice and/or the receiving device may include an electronic device,such as a mobile or stationary telephone handset (e.g., smartphone,cellular telephone, or the like), a desktop computer, a laptop ornotebook computer, a tablet computer, a set-top box, a television, acamera, a display device, a digital media player, a video gamingconsole, an Internet Protocol (IP) camera, a server device in a serversystem including one or more server devices (e.g., a video streamingserver system, or other suitable server system), a head-mounted display(HMD), a heads-up display (HUD), smart glasses (e.g., virtual reality(VR) glasses, augmented reality (AR) glasses, or other smart glasses),or any other suitable electronic device.

The components of the system 100 can include and/or can be implementedusing electronic circuits or other electronic hardware, which caninclude one or more programmable electronic circuits (e.g.,microprocessors, graphics processing units (GPUs), digital signalprocessors (DSPs), central processing units (CPUs), and/or othersuitable electronic circuits), and/or can include and/or be implementedusing computer software, firmware, or any combination thereof, toperform the various operations described herein.

While the system 100 is shown to include certain components, one ofordinary skill will appreciate that the system 100 can include more orfewer components than those shown in FIG. 1. For example, the system 100can also include, in some instances, one or more memory devices otherthan the storage 108 and the storage 118 (e.g., one or more randomaccess memory (RAM) components, read-only memory (ROM) components, cachememory components, buffer components, database components, and/or othermemory devices), one or more processing devices (e.g., one or more CPUs,GPUs, and/or other processing devices) in communication with and/orelectrically connected to the one or more memory devices, one or morewireless interfaces (e.g., including one or more transceivers and abaseband processor for each wireless interface) for performing wirelesscommunications, one or more wired interfaces (e.g., a serial interfacesuch as a universal serial bus (USB) input, a lightening connector,and/or other wired interface) for performing communications over one ormore hardwired connections, and/or other components that are not shownin FIG. 1.

The coding techniques described herein are applicable to video coding invarious multimedia applications, including streaming video transmissions(e.g., over the Internet), television broadcasts or transmissions,encoding of digital video for storage on a data storage medium, decodingof digital video stored on a data storage medium, or other applications.In some examples, system 100 can support one-way or two-way videotransmission to support applications such as video conferencing, videostreaming, video playback, video broadcasting, gaming, and/or videotelephony.

The encoding device 104 (or encoder) can be used to encode video datausing a video coding standard or protocol to generate an encoded videobitstream. Examples of video coding standards include ITU-T H.261,ISO/IEC MPEG-1 Visual, ITU-T H.262 or ISO/IEC MPEG-2 Visual, ITU-TH.263, ISO/IEC MPEG-4 Visual, ITU-T H.264 (also known as ISO/IEC MPEG-4AVC), including its Scalable Video Coding (SVC) and Multiview VideoCoding (MVC) extensions, and High Efficiency Video Coding (HEVC) orITU-T H.265. Various extensions to HEVC deal with multi-layer videocoding exist, including the range and screen content coding extensions,3D video coding (3D-HEVC) and multiview extensions (MV-HEVC) andscalable extension (SHVC). The HEVC and its extensions have beendeveloped by the Joint Collaboration Team on Video Coding (JCT-VC) aswell as Joint Collaboration Team on 3D Video Coding ExtensionDevelopment (JCT-3V) of ITU-T Video Coding Experts Group (VCEG) andISO/IEC Motion Picture Experts Group (MPEG).

MPEG and ITU-T VCEG have also formed a joint exploration video team(WET) to explore and develop new video coding tools for the nextgeneration of video coding standard, named Versatile Video Coding (VVC).The reference software is called VVC Test Model (VTM). An objective ofVVC is to provide a significant improvement in compression performanceover the existing HEVC standard, aiding in deployment of higher-qualityvideo services and emerging applications (e.g., such as 360°omnidirectional immersive multimedia, high-dynamic-range (HDR) video,among others). VP9, Alliance of Open Media (AOMedia) Video 1 (AV1), andEssential Video Coding (EVC) are other video coding standards for whichthe techniques described herein can be applied.

Many embodiments described herein can be performed using video codecssuch as VTM, VVC, HEVC, AVC, and/or extensions thereof. However, thetechniques and systems described herein may also be applicable to othercoding standards, such as MPEG, JPEG (or other coding standard for stillimages), VP9, AV1, extensions thereof, or other suitable codingstandards already available or not yet available or developed.Accordingly, while the techniques and systems described herein may bedescribed with reference to a particular video coding standard, one ofordinary skill in the art will appreciate that the description shouldnot be interpreted to apply only to that particular standard.

Referring to FIG. 1, a video source 102 may provide the video data tothe encoding device 104. The video source 102 may be part of the sourcedevice, or may be part of a device other than the source device. Thevideo source 102 may include a video capture device (e.g., a videocamera, a camera phone, a video phone, or the like), a video archivecontaining stored video, a video server or content provider providingvideo data, a video feed interface receiving video from a video serveror content provider, a computer graphics system for generating computergraphics video data, a combination of such sources, or any othersuitable video source.

The video data from the video source 102 may include one or more inputpictures. Pictures may also be referred to as “frames.” A picture orframe is a still image that, in some cases, is part of a video. In someexamples, data from the video source 102 can be a still image that isnot a part of a video. In HEVC, VVC, and other video codingspecifications, a video sequence can include a series of pictures. Apicture may include three sample arrays, denoted S_(L), S_(Cb), andS_(Cr). S_(L) is a two-dimensional array of luma samples, S_(Cb) is atwo-dimensional array of Cb chrominance samples, and S_(Cr) is atwo-dimensional array of Cr chrominance samples. Chrominance samples mayalso be referred to herein as “chroma” samples. In other instances, apicture may be monochrome and may only include an array of luma samples.

The encoder engine 106 (or encoder) of the encoding device 104 encodesthe video data to generate an encoded video bitstream. In some examples,an encoded video bitstream (or “video bitstream” or “bitstream”) is aseries of one or more coded video sequences. A coded video sequence(CVS) includes a series of access units (AUs) starting with an AU thathas a random access point picture in the base layer and with certainproperties up to and not including a next AU that has a random accesspoint picture in the base layer and with certain properties. Forexample, the certain properties of a random access point picture thatstarts a CVS may include a RASL flag (e.g., NoRaslOutputFlag) equalto 1. Otherwise, a random access point picture (with RASL flag equal to0) does not start a CVS. An access unit (AU) includes one or more codedpictures and control information corresponding to the coded picturesthat share the same output time. Coded slices of pictures areencapsulated in the bitstream level into data units called networkabstraction layer (NAL) units. For example, an HEVC video bitstream mayinclude one or more CVSs including NAL units. Each of the NAL units hasa NAL unit header. In one example, the header is one-byte for H.264/AVC(except for multi-layer extensions) and two-byte for HEVC. The syntaxelements in the NAL unit header take the designated bits and thereforeare visible to all kinds of systems and transport layers, such asTransport Stream, Real-time Transport (RTP) Protocol, File Format, amongothers.

Two classes of NAL units exist in the HEVC standard, including videocoding layer (VCL) NAL units and non-VCL NAL units. VCL NAL unitsinclude coded picture data forming a coded video bitstream. For example,a sequence of bits forming the coded video bitstream is present in VCLNAL units. A VCL NAL unit can include one slice or slice segment(described below) of coded picture data, and a non-VCL NAL unit includescontrol information that relates to one or more coded pictures. In somecases, a NAL unit can be referred to as a packet. An HEVC AU includesVCL NAL units containing coded picture data and non-VCL NAL units (ifany) corresponding to the coded picture data. Non-VCL NAL units maycontain parameter sets with high-level information relating to theencoded video bitstream, in addition to other information. For example,a parameter set may include a video parameter set (VPS), a sequenceparameter set (SPS), and a picture parameter set (PPS). In some cases,each slice or other portion of a bitstream can reference a single activePPS, SPS, and/or VPS to allow the decoding device 112 to accessinformation that may be used for decoding the slice or other portion ofthe bitstream.

NAL units may contain a sequence of bits forming a coded representationof the video data (e.g., an encoded video bitstream, a CVS of abitstream, or the like), such as coded representations of pictures in avideo. The encoder engine 106 generates coded representations ofpictures by partitioning each picture into multiple slices. A slice isindependent of other slices so that information in the slice is codedwithout dependency on data from other slices within the same picture. Aslice includes one or more slice segments including an independent slicesegment and, if present, one or more dependent slice segments thatdepend on previous slice segments.

In HEVC, the slices are partitioned into coding tree blocks (CTBs) ofluma samples and chroma samples. A CTB of luma samples and one or moreCTBs of chroma samples, along with syntax for the samples, are referredto as a coding tree unit (CTU). A CTU may also be referred to as a “treeblock” or a “largest coding unit” (LCU). A CTU is the basic processingunit for HEVC encoding. A CTU can be split into multiple coding units(CUs) of varying sizes. A CU contains luma and chroma sample arrays thatare referred to as coding blocks (CBs).

The luma and chroma CBs can be further split into prediction blocks(PBs). A PB is a block of samples of the luma component or a chromacomponent that uses the same motion parameters for inter-prediction orintra-block copy (IBC) prediction (when available or enabled for use).The luma PB and one or more chroma PBs, together with associated syntax,form a prediction unit (PU). For inter-prediction, a set of motionparameters (e.g., one or more motion vectors, reference indices, or thelike) is signaled in the bitstream for each PU and is used forinter-prediction of the luma PB and the one or more chroma PBs. Themotion parameters can also be referred to as motion information. A CBcan also be partitioned into one or more transform blocks (TBs). A TBrepresents a square block of samples of a color component on which aresidual transform (e.g., the same two-dimensional transform in somecases) is applied for coding a prediction residual signal. A transformunit (TU) represents the TBs of luma and chroma samples, andcorresponding syntax elements. Transform coding is described in moredetail below.

A size of a CU corresponds to a size of the coding mode and may besquare in shape. For example, a size of a CU may be 8×8 samples, 16×16samples, 32×32 samples, 64×64 samples, or any other appropriate size upto the size of the corresponding CTU. The phrase “N×N” is used herein torefer to pixel dimensions of a video block in terms of vertical andhorizontal dimensions (e.g., 8 pixels×8 pixels). The pixels in a blockmay be arranged in rows and columns. In some embodiments, blocks may nothave the same number of pixels in a horizontal direction as in avertical direction. Syntax data associated with a CU may describe, forexample, partitioning of the CU into one or more PUs. Partitioning modesmay differ between whether the CU is intra-prediction mode encoded orinter-prediction mode encoded. PUs may be partitioned to be non-squarein shape. Syntax data associated with a CU may also describe, forexample, partitioning of the CU into one or more TUs according to a CTU.A TU can be square or non-square in shape.

According to the HEVC standard, transformations may be performed usingtransform units (TUs). TUs may vary for different CUs. The TUs may besized based on the size of PUs within a given CU. The TUs may be thesame size or smaller than the PUs. In some examples, residual samplescorresponding to a CU may be subdivided into smaller units using aquadtree structure known as residual quad tree (RQT). Leaf nodes of theRQT may correspond to TUs. Pixel difference values associated with theTUs may be transformed to produce transform coefficients. The transformcoefficients may be quantized by the encoder engine 106.

Once the pictures of the video data are partitioned into CUs, theencoder engine 106 predicts each PU using a prediction mode. Theprediction unit or prediction block is subtracted from the originalvideo data to get residuals (described below). For each CU, a predictionmode may be signaled inside the bitstream using syntax data. Aprediction mode may include intra-prediction (or intra-pictureprediction) or inter-prediction (or inter-picture prediction).Intra-prediction utilizes the correlation between spatially neighboringsamples within a picture. For example, using intra-prediction, each PUis predicted from neighboring image data in the same picture using, forexample, DC prediction to find an average value for the PU, planarprediction to fit a planar surface to the PU, direction prediction toextrapolate from neighboring data, or any other suitable types ofprediction. Inter-prediction uses the temporal correlation betweenpictures in order to derive a motion-compensated prediction for a blockof image samples. For example, using inter-prediction, each PU ispredicted using motion compensation prediction from image data in one ormore reference pictures (before or after the current picture in outputorder). The decision whether to code a picture area using inter-pictureor intra-picture prediction may be made, for example, at the CU level.

The encoder engine 106 and decoder engine 116 (described in more detailbelow) may be configured to operate according to VVC. According to VVC,a video coder (such as encoder engine 106 and/or decoder engine 116)partitions a picture into a plurality of coding tree units (CTUs) (wherea CTB of luma samples and one or more CTBs of chroma samples, along withsyntax for the samples, are referred to as a CTU). The video coder canpartition a CTU according to a tree structure, such as a quadtree-binarytree (QTBT) structure or Multi-Type Tree (MTT) structure. The QTBTstructure removes the concepts of multiple partition types, such as theseparation between CUs, PUs, and TUs of HEVC. A QTBT structure includestwo levels, including a first level partitioned according to quadtreepartitioning, and a second level partitioned according to binary treepartitioning. A root node of the QTBT structure corresponds to a CTU.Leaf nodes of the binary trees correspond to coding units (CUs).

In an MTT partitioning structure, blocks may be partitioned using aquadtree partition, a binary tree partition, and one or more types oftriple tree partitions. A triple tree partition is a partition where ablock is split into three sub-blocks. In some examples, a triple treepartition divides a block into three sub-blocks without dividing theoriginal block through the center. The partitioning types in MTT (e.g.,quadtree, binary tree, and tripe tree) may be symmetrical orasymmetrical.

In some examples, the video coder can use a single QTBT or MTT structureto represent each of the luminance and chrominance components, while inother examples, the video coder can use two or more QTBT or MTTstructures, such as one QTBT or MTT structure for the luminancecomponent and another QTBT or MTT structure for both chrominancecomponents (or two QTBT and/or MTT structures for respective chrominancecomponents).

The video coder can be configured to use quadtree partitioning per HEVC,QTBT partitioning, MTT partitioning, or other partitioning structures.For illustrative purposes, the description herein may refer to QTBTpartitioning. However, it should be understood that the techniques ofthe disclosure may also be applied to video coders configured to usequadtree partitioning, or other types of partitioning as well.

In some examples, the one or more slices of a picture are assigned aslice type. Slice types include an intra-coded slice (I-slice), aninter-coded P-slice, and an inter-coded B-slice. An I-slice (intra-codedframes, independently decodable) is a slice of a picture that is onlycoded by intra-prediction, and therefore is independently decodablesince the I-slice requires only the data within the frame to predict anyprediction unit or prediction block of the slice. A P-slice(uni-directional predicted frames) is a slice of a picture that may becoded with intra-prediction and with uni-directional inter-prediction.Each prediction unit or prediction block within a P-slice is eithercoded with intra-prediction or inter-prediction. When theinter-prediction applies, the prediction unit or prediction block isonly predicted by one reference picture, and therefore reference samplesare only from one reference region of one frame. A B-slice(bi-directional predictive frames) is a slice of a picture that may becoded with intra-prediction and with inter-prediction (e.g., eitherbi-prediction or uni-prediction). A prediction unit or prediction blockof a B-slice may be bi-directionally predicted from two referencepictures, where each picture contributes one reference region and samplesets of the two reference regions are weighted (e.g., with equal weightsor with different weights) to produce the prediction signal of thebi-directional predicted block. As explained above, slices of onepicture are independently coded. In some cases, a picture can be codedas just one slice.

As noted above, intra-picture prediction utilizes the correlationbetween spatially neighboring samples within a picture. There are aplurality of intra-prediction modes (also referred to as “intra modes”).In some examples, the intra prediction of a luma block includes 35modes, including the Planar mode, DC mode, and 33 angular modes (e.g.,diagonal intra prediction modes and angular modes adjacent to thediagonal intra prediction modes). The 35 modes of the intra predictionare indexed as shown in Table 1 below. In other examples, more intramodes may be defined including prediction angles that may not already berepresented by the 33 angular modes. In other examples, the predictionangles associated with the angular modes may be different from thoseused in HEVC.

TABLE 1 Specification of intra prediction mode and associated namesIntra-prediction mode Associated name 0 INTRA_PLANAR 1 INTRA_DC 2 . . .34 INTRA_ANGULAR2 . . . INTRA_ANGULAR34

Inter-picture prediction uses the temporal correlation between picturesin order to derive a motion-compensated prediction for a block of imagesamples. Using a translational motion model, the position of a block ina previously decoded picture (a reference picture) is indicated by amotion vector (Δx, Δy), with Δx specifying the horizontal displacementand Δy specifying the vertical displacement of the reference blockrelative to the position of the current block. In some cases, a motionvector (Δx, Δy) can be in integer sample accuracy (also referred to asinteger accuracy), in which case the motion vector points to theinteger-pel grid (or integer-pixel sampling grid) of the referenceframe. In some cases, a motion vector (Δx, Δy) can be of fractionalsample accuracy (also referred to as fractional-pel accuracy ornon-integer accuracy) to more accurately capture the movement of theunderlying object, without being restricted to the integer-pel grid ofthe reference frame. Accuracy of motion vectors may be expressed by thequantization level of the motion vectors. For example, the quantizationlevel may be integer accuracy (e.g., 1-pixel) or fractional-pel accuracy(e.g., ¼-pixel, ½-pixel, or other sub-pixel value). Interpolation isapplied on reference pictures to derive the prediction signal when thecorresponding motion vector has fractional sample accuracy. For example,samples available at integer positions can be filtered (e.g., using oneor more interpolation filters) to estimate values at fractionalpositions. The previously decoded reference picture is indicated by areference index (refIdx) to a reference picture list. The motion vectorsand reference indices can be referred to as motion parameters. Two kindsof inter-picture prediction can be performed, including uni-predictionand bi-prediction.

With inter-prediction using bi-prediction, two sets of motion parameters(Δx₀, y₀, refIdx₀ and Δx₁, y₁, refIdx₁) are used to generate two motioncompensated predictions (from the same reference picture or possiblyfrom different reference pictures). For example, with bi-prediction,each prediction block uses two motion compensated prediction signals,and generates B prediction units. The two motion compensated predictionsare combined to get the final motion compensated prediction. Forexample, the two motion compensated predictions can be combined byaveraging. In another example, weighted prediction can be used, in whichcase different weights can be applied to each motion compensatedprediction. The reference pictures that can be used in bi-prediction arestored in two separate lists, denoted as list 0 and list 1. Motionparameters can be derived at the encoder using a motion estimationprocess.

With inter-prediction using uni-prediction, one set of motion parameters(Δx₀, y₀, refIdx₀) is used to generate a motion compensated predictionfrom a reference picture. For example, with uni-prediction, eachprediction block uses at most one motion compensated prediction signal,and generates P prediction units.

A PU may include the data (e.g., motion parameters or other suitabledata) related to the prediction process. For example, when the PU isencoded using intra-prediction, the PU may include data describing anintra-prediction mode for the PU. As another example, when the PU isencoded using inter-prediction, the PU may include data defining amotion vector for the PU. The data defining the motion vector for a PUmay describe, for example, a horizontal component of the motion vector(Δx), a vertical component of the motion vector (Δy), a resolution forthe motion vector (e.g., integer precision, one-quarter pixel precisionor one-eighth pixel precision), a reference picture to which the motionvector points, a reference index, a reference picture list (e.g., List0, List 1, or List C) for the motion vector, or any combination thereof.

After performing prediction using intra- and/or inter-prediction, theencoding device 104 can perform transformation and quantization. Forexample, following prediction, the encoder engine 106 may calculateresidual values corresponding to the PU. Residual values may comprisepixel difference values between the current block of pixels being coded(the PU) and the prediction block used to predict the current block(e.g., the predicted version of the current block). For example, aftergenerating a prediction block (e.g., using inter-prediction orintra-prediction), the encoder engine 106 can generate a residual blockby subtracting the prediction block produced by a prediction unit fromthe current block. The residual block includes a set of pixel differencevalues that quantify differences between pixel values of the currentblock and pixel values of the prediction block. In some examples, theresidual block may be represented in a two-dimensional block format(e.g., a two-dimensional matrix or array of pixel values). In suchexamples, the residual block is a two-dimensional representation of thepixel values.

Any residual data that may be remaining after prediction is performed istransformed using a block transform, which may be based on discretecosine transform (DCT), discrete sine transform (DST), an integertransform, a wavelet transform, other suitable transform function, orany combination thereof. In some cases, one or more block transforms(e.g., a kernel of size 32×32, 16×16, 8×8, 4×4, or other suitable size)may be applied to residual data in each CU. In some examples, a TU maybe used for the transform and quantization processes implemented by theencoder engine 106. A given CU having one or more PUs may also includeone or more TUs. As described in further detail below, the residualvalues may be transformed into transform coefficients using the blocktransforms, and may be quantized and scanned using TUs to produceserialized transform coefficients for entropy coding.

In some embodiments following intra-predictive or inter-predictivecoding using PUs of a CU, the encoder engine 106 may calculate residualdata for the TUs of the CU. The PUs may comprise pixel data in thespatial domain (or pixel domain). As previously noted, the residual datamay correspond to pixel difference values between pixels of theunencoded picture and prediction values corresponding to the PUs. Theencoder engine 106 may form one or more TUs including the residual datafor a CU (which includes the PUs), and may transform the TUs to producetransform coefficients for the CU. The TUs may comprise coefficients inthe transform domain following application of a block transform.

The encoder engine 106 may perform quantization of the transformcoefficients. Quantization provides further compression by quantizingthe transform coefficients to reduce the amount of data used torepresent the coefficients. For example, quantization may reduce the bitdepth associated with some or all of the coefficients. In one example, acoefficient with an n-bit value may be rounded down to an m-bit valueduring quantization, with n being greater than m.

Once quantization is performed, the coded video bitstream includesquantized transform coefficients, prediction information (e.g.,prediction modes, motion vectors, block vectors, or the like),partitioning information, and any other suitable data, such as othersyntax data. The different elements of the coded video bitstream may beentropy encoded by the encoder engine 106. In some examples, the encoderengine 106 may utilize a predefined scan order to scan the quantizedtransform coefficients to produce a serialized vector that can beentropy encoded. In some examples, encoder engine 106 may perform anadaptive scan. After scanning the quantized transform coefficients toform a vector (e.g., a one-dimensional vector), the encoder engine 106may entropy encode the vector. For example, the encoder engine 106 mayuse context adaptive variable length coding, context adaptive binaryarithmetic coding, syntax-based context-adaptive binary arithmeticcoding, probability interval partitioning entropy coding, or anothersuitable entropy encoding technique.

The output 110 of the encoding device 104 may send the NAL units makingup the encoded video bitstream data over the communications link 120 tothe decoding device 112 of the receiving device. The input 114 of thedecoding device 112 may receive the NAL units. The communications link120 may include a channel provided by a wireless network, a wirednetwork, or a combination of a wired and wireless network. A wirelessnetwork may include any wireless interface or combination of wirelessinterfaces and may include any suitable wireless network (e.g., theInternet or other wide area network, a packet-based network, WiFi™,radio frequency (RF), UWB, WiFi-Direct, cellular, Long-Term Evolution(LTE), WiMax™, or the like). A wired network may include any wiredinterface (e.g., fiber, ethernet, powerline ethernet, ethernet overcoaxial cable, digital signal line (DSL), or the like). The wired and/orwireless networks may be implemented using various equipment, such asbase stations, routers, access points, bridges, gateways, switches, orthe like. The encoded video bitstream data may be modulated according toa communication standard, such as a wireless communication protocol, andtransmitted to the receiving device.

In some examples, the encoding device 104 may store encoded videobitstream data in storage 108. The output 110 may retrieve the encodedvideo bitstream data from the encoder engine 106 or from the storage108. Storage 108 may include any of a variety of distributed or locallyaccessed data storage media. For example, the storage 108 may include ahard drive, a storage disc, flash memory, volatile or non-volatilememory, or any other suitable digital storage media for storing encodedvideo data. The storage 108 can also include a decoded picture buffer(DPB) for storing reference pictures for use in inter-prediction. In afurther example, the storage 108 can correspond to a file server oranother intermediate storage device that may store the encoded videogenerated by the source device. In such cases, the receiving deviceincluding the decoding device 112 can access stored video data from thestorage device via streaming or download. The file server may be anytype of server capable of storing encoded video data and transmittingthat encoded video data to the receiving device. Example file serversinclude a web server (e.g., for a website), an FTP server, networkattached storage (NAS) devices, or a local disk drive. The receivingdevice may access the encoded video data through any standard dataconnection, including an Internet connection. The access may include awireless channel (e.g., a Wi-Fi connection), a wired connection (e.g.,DSL, cable modem, etc.), or a combination of both that is suitable foraccessing encoded video data stored on a file server. The transmissionof encoded video data from the storage 108 may be a streamingtransmission, a download transmission, or a combination thereof.

The input 114 of the decoding device 112 receives the encoded videobitstream data and may provide the video bitstream data to the decoderengine 116, or to storage 118 for later use by the decoder engine 116.For example, the storage 118 can include a DPB for storing referencepictures for use in inter-prediction. The receiving device including thedecoding device 112 can receive the encoded video data to be decoded viathe storage 108. The encoded video data may be modulated according to acommunication standard, such as a wireless communication protocol, andtransmitted to the receiving device. The communication medium fortransmitted the encoded video data can comprise any wireless or wiredcommunication medium, such as a radio frequency (RF) spectrum or one ormore physical transmission lines. The communication medium may form partof a packet-based network, such as a local area network, a wide-areanetwork, or a global network such as the Internet. The communicationmedium may include routers, switches, base stations, or any otherequipment that may be useful to facilitate communication from the sourcedevice to the receiving device.

The decoder engine 116 may decode the encoded video bitstream data byentropy decoding (e.g., using an entropy decoder) and extracting theelements of one or more coded video sequences making up the encodedvideo data. The decoder engine 116 may rescale and perform an inversetransform on the encoded video bitstream data. Residual data is passedto a prediction stage of the decoder engine 116. The decoder engine 116predicts a block of pixels (e.g., a PU). In some examples, theprediction is added to the output of the inverse transform (the residualdata).

The video decoding device 112 may output the decoded video to a videodestination device 122, which may include a display or other outputdevice for displaying the decoded video data to a consumer of thecontent. In some aspects, the video destination device 122 may be partof the receiving device that includes the decoding device 112. In someaspects, the video destination device 122 may be part of a separatedevice other than the receiving device.

In some embodiments, the video encoding device 104 and/or the videodecoding device 112 may be integrated with an audio encoding device andaudio decoding device, respectively. The video encoding device 104and/or the video decoding device 112 may also include other hardware orsoftware that is necessary to implement the coding techniques describedabove, such as one or more microprocessors, digital signal processors(DSPs), application specific integrated circuits (ASICs), fieldprogrammable gate arrays (FPGAs), discrete logic, software, hardware,firmware or any combinations thereof. The video encoding device 104 andthe video decoding device 112 may be integrated as part of a combinedencoder/decoder (codec) in a respective device.

The example system shown in FIG. 1 is one illustrative example that canbe used herein. Techniques for processing video data using thetechniques described herein can be performed by any digital videoencoding and/or decoding device. Although generally the techniques ofthe disclosure are performed by a video encoding device or a videodecoding device, the techniques may also be performed by a combinedvideo encoder-decoder, typically referred to as a “CODEC.” Moreover, thetechniques of the disclosure may also be performed by a videopreprocessor. The source device and the receiving device are merelyexamples of such coding devices in which the source device generatescoded video data for transmission to the receiving device. In someexamples, the source and receiving devices may operate in asubstantially symmetrical manner such that each of the devices includevideo encoding and decoding components. Hence, example systems maysupport one-way or two-way video transmission between video devices,e.g., for video streaming, video playback, video broadcasting, or videotelephony.

Extensions to the HEVC standard include the Multiview Video Codingextension, referred to as MV-HEVC, and the Scalable Video Codingextension, referred to as SHVC. The MV-HEVC and SHVC extensions sharethe concept of layered coding, with different layers being included inthe encoded video bitstream. Each layer in a coded video sequence isaddressed by a unique layer identifier (ID). A layer ID may be presentin a header of a NAL unit to identify a layer with which the NAL unit isassociated. In MV-HEVC, different layers usually represent differentviews of the same scene in the video bitstream. In SHVC, differentscalable layers are provided that represent the video bitstream indifferent spatial resolutions (or picture resolution) or in differentreconstruction fidelities. The scalable layers may include a base layer(with layer ID=0) and one or more enhancement layers (with layer IDs=1,2, . . . n). The base layer may conform to a profile of the firstversion of HEVC, and represents the lowest available layer in abitstream. The enhancement layers have increased spatial resolution,temporal resolution or frame rate, and/or reconstruction fidelity (orquality) as compared to the base layer. The enhancement layers arehierarchically organized and may (or may not) depend on lower layers. Insome examples, the different layers may be coded using a single standardcodec (e.g., all layers are encoded using HEVC, SHVC, or other codingstandard). In some examples, different layers may be coded using amulti-standard codec. For example, a base layer may be coded using AVC,while one or more enhancement layers may be coded using SHVC and/orMV-HEVC extensions to the HEVC standard.

As described above, for each block, a set of motion information (alsoreferred to herein as motion parameters) can be available. A set ofmotion information can contain motion information for forward andbackward prediction directions. Here, forward and backward predictiondirections are two prediction directions of a bi-directional predictionmode and the terms “forward” and “backward” do not necessarily have ageometry meaning. Instead, forward and backward can correspond to areference picture list 0 (RefPicList0) and a reference picture list 1(RefPicList1) of a current picture, slice, or block. In some examples,when only one reference picture list is available for a picture, slice,or block, only RefPicList0 is available and the motion information ofeach block of a slice is always forward. In some examples, RefPicList0includes reference pictures that precede a current picture in time, andRefPicList1 includes reference pictures that follow the current picturein time. In some cases, a motion vector together with an associatedreference index can be used in decoding processes. Such a motion vectorwith the associated reference index is denoted as a uni-predictive setof motion information.

For each prediction direction, the motion information can contain areference index and a motion vector. In some cases, for simplicity, amotion vector can have associated information, from which it can beassumed a way that the motion vector has an associated reference index.A reference index can be used to identify a reference picture in thecurrent reference picture list (RefPicList0 or RefPicList1). A motionvector can have a horizontal and a vertical component that provide anoffset from the coordinate position in the current picture to thecoordinates in the reference picture identified by the reference index.For example, a reference index can indicate a particular referencepicture that should be used for a block in a current picture, and themotion vector can indicate where in the reference picture thebest-matched block (the block that best matches the current block) is inthe reference picture.

A picture order count (POC) can be used in video coding standards toidentify a display order of a picture. Although there are cases forwhich two pictures within one coded video sequence may have the same POCvalue, within one coded video sequence two pictures with the same POCvalue does not occur often. When multiple coded video sequences arepresent in a bitstream, pictures with a same POC value may be closer toeach other in terms of decoding order. POC values of pictures can beused for reference picture list construction, derivation of referencepicture set as in HEVC, and/or motion vector scaling, among otherthings.

In H.264/AVC, each inter-macroblock (MB) may be partitioned into fourdifferent ways, including: one 16×16 macroblock partition; two 16×8macroblock partitions; two 8×16 macroblock partitions; and four 8×8macroblock partitions, among others. Different macroblock partitions inone macroblock may have different reference index values for eachprediction direction (e.g., different reference index values forRefPicList0 and RefPicList1).

In some cases, when a macroblock is not partitioned into four 8×8macroblock partitions, the macroblock can have only one motion vectorfor each macroblock partition in each prediction direction. In somecases, when a macroblock is partitioned into four 8×8 macroblockpartitions, each 8×8 macroblock partition can be further partitionedinto sub-blocks, each of which can have a different motion vector ineach prediction direction. An 8×8 macroblock partition can be dividedinto sub-blocks in different ways, including: one 8×8 sub-block; two 8×4sub-blocks; two 4×8 sub-blocks; and four 4×4 sub-blocks, among others.Each sub-block can have a different motion vector in each predictiondirection. Therefore, a motion vector can be present in a level equal toor higher than a sub-block.

In HEVC, the largest coding unit in a slice is called a coding treeblock (CTB) or coding tree unit (CTU). A CTB contains a quad-tree, thenodes of which are coding units. The size of a CTB can range from 16×16pixels to 64×64 pixels in the HEVC main profile. In some cases, 8×8pixel CTB sizes can be supported. A CTB may be recursively split intocoding units (CU) in a quad-tree manner. A CU could be the same size asa CTB and as small as 8×8 pixels. In some cases, each coding unit iscoded with one mode, such as either intra-prediction mode orinter-prediction mode. When a CU is inter-coded using aninter-prediction mode, the CU may be further partitioned into two orfour prediction units (PUs), or may be treated as one PU when furtherpartitioning does not apply. When two PUs are present in one CU, the twoPUs can be half size rectangles or two rectangles that are ¼ or ¾ thesize of the CU.

When the CU is inter-coded, one set of motion information can be presentfor each PU, which can be derived with a unique inter-prediction mode.For example, each PU can be coded with one inter-prediction mode toderive the set of motion information. In some cases, when a CU isintra-coded using intra-prediction mode, the PU shapes can be 2N×2N andN×N. Within each PU, a single intra-prediction mode is coded (whilechroma prediction mode is signalled at the CU level). In some cases, theN×N intra PU shapes are allowed when the current CU size is equal to thesmallest CU size defined in SPS.

For motion prediction in HEVC, there are two inter-prediction modes fora prediction unit (PU), including merge mode and advanced motion vectorprediction (AMVP) mode. Skip is considered as a special case of merge.In either AMVP mode or merge mode, a motion vector (MV) candidate listis maintained for multiple motion vector predictors. The motionvector(s), as well as reference indices in the merge mode, of thecurrent PU are generated by taking one candidate from the MV candidatelist.

In some examples, the MV candidate list contains up to five candidatesfor the merge mode and two candidates for the AMVP mode. In otherexamples, different numbers of candidates can be included in a MVcandidate list for merge mode and/or AMVP mode. A merge candidate maycontain a set of motion information. For example, a set of motioninformation can include motion vectors corresponding to both referencepicture lists (list 0 and list 1) and the reference indices. If a mergecandidate is identified by a merge index, the reference pictures areused for the prediction of the current blocks, as well as the associatedmotion vectors are determined. However, under AMVP mode, for eachpotential prediction direction from either list 0 or list 1, a referenceindex needs to be explicitly signaled, together with an MV predictor(MVP) index to the MV candidate list since the AMVP candidate containsonly a motion vector. In AMVP mode, the predicted motion vectors can befurther refined.

A merge candidate may correspond to a full set of motion information,while an AMVP candidate may contain one motion vector for a specificprediction direction and a reference index. The candidates for bothmodes are derived similarly from the same spatial and temporalneighboring blocks.

In some examples, merge mode allows an inter-predicted PU to inherit thesame motion vector or vectors, prediction direction, and referencepicture index or indices from an inter-predicted PU that includes amotion data position selected from a group of spatially neighboringmotion data positions and one of two temporally co-located motion datapositions. For AMVP mode, motion vector or vectors of a PU can bepredicatively coded relative to one or more motion vector predictors(MVPs) from an AMVP candidate list constructed by an encoder. In someinstances, for single direction inter-prediction of a PU, the encodercan generate a single AMVP candidate list. In some instances, forbi-directional prediction of a PU, the encoder can generate two AMVPcandidate lists, one using motion data of spatial and temporalneighboring PUs from the forward prediction direction and one usingmotion data of spatial and temporal neighboring PUs from the backwardprediction direction.

The candidates for both modes can be derived from spatial and/ortemporal neighboring blocks. For example, FIG. 2A and FIG. 2B includeconceptual diagrams illustrating spatial neighboring candidates in HEVC.FIG. 2A illustrates spatial neighboring motion vector (MV) candidatesfor merge mode. FIG. 2B illustrates spatial neighboring motion vector(MV) candidates for AMVP mode. Spatial MV candidates are derived fromthe neighboring blocks for a specific PU (PU0), although the methodsgenerating the candidates from the blocks differ for merge and AMVPmodes.

In merge mode, the encoder can form a merging candidate list byconsidering merging candidates from various motion data positions. Forexample, as shown in FIG. 2A, up to four spatial MV candidates can bederived with respect to spatially neighboring motion data positionsshown with numbers 0-4 in FIG. 2A. The MV candidates can be ordered inthe merging candidate list in the order shown by the numbers 0-4. Forexample, the positions and order can include: left position (0), aboveposition (1), above right position (2), below left position (3), andabove left position (4).

In AVMP mode shown in FIG. 2B, the neighboring blocks are divided intotwo groups: left group including the blocks 0 and 1, and above groupincluding the blocks 2, 3, and 4. For each group, the potentialcandidate in a neighboring block referring to the same reference pictureas that indicated by the signaled reference index has the highestpriority to be chosen to form a final candidate of the group. It ispossible that all neighboring blocks do not contain a motion vectorpointing to the same reference picture. Therefore, if such a candidatecannot be found, the first available candidate will be scaled to formthe final candidate, thus the temporal distance differences can becompensated.

FIG. 3A and FIG. 3B include conceptual diagrams illustrating temporalmotion vector prediction in HEVC. A temporal motion vector predictor(TMVP) candidate, if enabled and available, is added into a MV candidatelist after spatial motion vector candidates. The process of motionvector derivation for a TMVP candidate is the same for both merge andAMVP modes. In some instances, however, the target reference index forthe TMVP candidate in the merge mode is always set to zero.

The primary block location for TMVP candidate derivation is the bottomright block outside of the collocated PU, as shown in FIG. 3A as a block“T”, to compensate for the bias to the above and left blocks used togenerate spatial neighboring candidates. However, if that block islocated outside of the current CTB (or LCU) row or motion information isnot available, the block is substituted with a center block of the PU. Amotion vector for a TMVP candidate is derived from the co-located PU ofthe co-located picture, indicated in the slice level. Similar totemporal direct mode in AVC, a motion vector of the TMVP candidate maybe subject to motion vector scaling, which is performed to compensatefor distance differences.

Other aspects of motion prediction are also covered in the HEVC, VVC,and other video coding specifications. For example, one aspect includesmotion vector scaling. In motion vector scaling, a value of motionvectors is assumed to be proportional to a distance between pictures inpresentation time. In some examples, a first motion vector can beassociated with two pictures, including a first reference picture and afirst containing picture which includes the first motion vector. Thefirst motion vector can be utilized to predict a second motion vector.For predicting the second motion vector, a first distance between thefirst containing picture and the first reference picture of the firstmotion can be calculated based on Picture Order Count (POC) valuesassociated with the first reference picture and the first containingpicture.

A second reference picture and a second containing picture may beassociated with the second motion vector to be predicted, where thesecond reference picture can be different from the first referencepicture and the second containing picture can be different from thefirst containing picture. A second distance can be calculated betweenthe second reference picture and the second containing picture based POCvalues associated with the second reference picture and the secondcontaining picture, where the second distance can be different from thefirst distance. For predicting the second motion vector, the firstmotion vector can be scaled based on the first distance and the seconddistance. For a spatially neighboring candidate, the first containingpicture and the second containing picture of the first motion vector andthe second motion vector, respectively, can be the same, while the firstreference picture and the second reference picture may be different. Insome examples, the motion vector scaling can be applied for TMVP andAMVP modes, for the spatial and temporal neighboring candidates.

Another aspect of motion prediction includes artificial motion vectorcandidate generation. For example, if a motion vector candidate list isnot complete, artificial motion vector candidates are generated andinserted at the end of the motion vector candidate list until allcandidates are obtained. In merge mode, there are two types ofartificial MV candidates: a first type which includes combinedcandidates derived only for B-slices; and second type which includeszero candidates used only for AMVP if the first type does not providesufficient artificial candidates. For each pair of candidates that arealready in the motion vector candidate list and that have relevantmotion information, bi-directional combined motion vector candidates canbe derived by a combination of the motion vector of the first candidatereferring to a picture in the list 0 and the motion vector of a secondcandidate referring to a picture in the list 1.

Another aspect of merge and AMVP modes includes a pruning process forcandidate insertion. For example, candidates from different blocks mayhappen to be the same, which decreases the efficiency of a merge and/orAMVP candidate list. A pruning process can be applied to solve theproblem. The pruning process includes comparing a candidate against thecandidates already present in the current candidate list to avoidinserting identical or duplicate candidates. To reduce the complexity ofthe comparison, the pruning process can be performed for less than allpotential candidates to be inserted in the candidate list.

In some examples, enhanced motion vector predictions can be implemented.For instance, some inter coding tools are specified in video codingstandards such as VVC, according to which the candidate list of motionvector prediction or merge prediction for a current block can be derivedor refined. Examples of such approaches are described below.

A history-based motion vector prediction (HMVP) is a motion vectorprediction method that allows each block to find its MV predictor from alist of MVs decoded from the past in additional to those in immediatelyadjacent causal neighboring motion fields. For example, using HMVP, oneor more MV predictors for a current block can be obtained or predictedfrom a list of previously decoded MVs in addition to those inimmediately adjacent causally neighboring motion fields. The MVpredictors in the list of previously decoded MVs are referred to as HMVPcandidates. The HMVP candidates can include motion informationassociated with inter-coded blocks. An HMVP table with multiple HMVPcandidates can be maintained during an encoding and/or decoding processfor a slice. In some examples, the HMVP table can be dynamicallyupdated. For example, after decoding an inter-coded block, the HMVPtable can be updated by adding the associated motion information of thedecoded inter-coded block to the HMVP table as a new HMVP candidate. Insome examples, the HMVP table can be emptied when a new slice isencountered.

In some cases, whenever there is an inter-coded block, the associatedmotion information can be inserted to the table in a first-in-first-out(FIFO) fashion as a new HMVP candidate. A constraint FIFO rule can beapplied. When inserting an HMVP to the table, a redundancy check can befirstly applied to find whether there is an identical HMVP in the table.If found, that particular HMVP can be removed from the table and all theHMVP candidates afterwards are moved.

In some examples, HMVP candidates can be used in the merge candidatelist construction process. In some cases, all HMVP candidates from thelast entry to the first entry in the table are inserted after the TMVPcandidate. Pruning can be applied on the HMVP candidates. Once the totalnumber of available merge candidates reaches the signaled maximallyallowed merge candidates, the merge candidate list construction processcan be terminated.

In some examples, HMVP candidates can be used in the AMVP candidate listconstruction process. In some cases, the motion vectors of the last KHMVP candidates in the table are inserted after the TMVP candidate. Insome implementations, only HMVP candidates with the same referencepicture as the AMVP target reference picture are used to construct theAMVP candidate list. Pruning can be applied on the HMVP candidates.

FIG. 4 is a block diagram illustrating an example of an HMVP table 400.The HMVP table 400 can be implemented as a storage device managed usinga First-In-First-Out (FIFO) rule. For example, HMVP candidates whichinclude MV predictors can be stored in the HMVP table 400. The HMVPcandidates can be stored in an order in which they are encoded ordecoded. In an example, the order in which the HMVP candidates arestored in the HMVP table 400 can correspond to a time at which the HMVPcandidates are constructed. For example, when implemented in a decodersuch as the decoding device 112, an HMVP candidate can be constructed toinclude motion information of a decoded inter-coded block. In someexamples, one or more HMVP candidates from the HMVP table 400 caninclude the motion vector predictors which can be used for motion vectorprediction a current block to be decoded. In some examples, one or moreHMVP candidates can include one or more such previously decoded blockswhich can be stored in the time order in which they were decoded in oneor more entries of the HMVP table 400 in a FIFO manner.

An HMVP candidate index 402 is shown to be associated with the HMVPtable 400. The HMVP candidate index 402 can identify the one or moreentries of the HMVP table 400. The HMVP candidate index 402 is shown toinclude the index values 0 to 4 according to an illustrative example,where each of the index values of the HMVP candidate index 402 isassociated with a corresponding entry. The HMVP table 400 can includemore or less entries than those shown and described with reference toFIG. 4 in other examples. As HMVP candidates are constructed, they arepopulated in the HMVP table 400 in a FIFO manner. For example, as theHMVP candidates are decoded, they are inserted into the HMVP table 400in one end and moved sequentially through the entries of the HMVP table400 until they exit the HMVP table 400 from another end. Accordingly, amemory structure such as a shift register can be used to implement theHMVP table 400 in some examples. In an example, the index value 0 canpoint to a first entry of the HMVP table 400, where the first entry cancorrespond to a first end of the HMVP table 400 at which the HMVPcandidates are inserted. Correspondingly, the index value 4 can point toa second entry of the HMVP table 400, where the second entry cancorrespond to second end of the HMVP table 400 from which the HMVPcandidates exit or are emptied from the HMVP table 400. Accordingly, anHMVP candidate which is inserted at the first entry at the index value 0can traverse the HMVP table 400 to make room for newer or more recentlydecoded HMVP candidates until the HMVP candidate reaches the secondentry at the index value 4. Thus, among the HVMP candidates present inthe HMVP table 400 at any given time, the HMVP candidate in the secondentry at the index value 4 may be the oldest or least recent, while theHMVP candidate in the first entry at the index value 0 may be theyoungest or most recent. In general, the HMVP candidate in the secondentry may be an older or less recently constructed HMVP candidate thanthe HMVP candidate in the first entry.

In FIG. 4, different states of the HMVP table 400 are identified withthe reference numerals 400A, 400B, and 400C. Referring to the state forreference numerals 400A, HMVP candidates HMVP0 to HMVP4 are shown to bepresent in entries of the HMVP table 400 at respective index values 4 to0. For example, HMVP0 may be the oldest or least recent HMVP candidatewhich was inserted into the HMVP table 400 at the first entry at theindex value 0. HMVP0 may be shifted sequentially to make room for theless recently inserted and newer HMVP candidates HMVP1 to HMVP4 untilHMVP0 reaches the second entry at the index value 4 shown in the statefor reference numerals 400A. Correspondingly, HMVP4 may be the mostrecent HMVP candidate to be inserted in the first entry at the indexvalue 0. Thus, HMVP0 is an older or less recent HMVP candidate in theHMVP table 400 in relation to HMVP4.

In some examples, one or more of the HMVP candidates HMVP0 to HMVP4 caninclude motion vector information which can be redundant. For example, aredundant HMVP candidate can include motion vector information which isidentical to the motion vector information in one or more other HMVPcandidates stored in the HMVP table 400. Since the motion vectorinformation of the redundant HMVP candidate can be obtained from the oneor more other HMVP candidates, storing the redundant HMVP candidate inthe HMVP table 400 can be avoided. By avoiding the redundant HMVPcandidates from being stored in the HMVP table 400, resources of theHMVP table 400 can be utilized more efficiently. In some examples, priorto storing an HMVP candidate in the HMVP table 400, a redundancy checkcan be performed to determine whether the HMVP candidate would beredundant (e.g., the motion vector information of the HMVP candidate canbe compared to the motion vector information of the other HMVPcandidates already stored to determine whether there is a match).

In some examples, the state for reference numerals 400B of the HMVPtable 400 is a conceptual illustration of the above-described redundancycheck. In some examples, the HMVP candidates can be populated in theHMVP table 400 as they are decoded, and the redundancy check can beperformed periodically, rather than being performed as a threshold testbefore the HMVP candidates are stored. For example, as shown in thestate for reference numerals 400B, the HMVP candidates HMVP1 and HMVP3can be identified as redundant candidates (i.e., their motioninformation is identical to that of one of the other HMVP candidates inthe HMVP table 400). The redundant HMVP candidates HMVP1 and HMVP3 canbe removed and the remaining HMVP candidates can be shifted accordingly.

For example, as shown in the state for reference numerals 400C, the HMVPcandidates HMVP2 and HMVP4 are shifted towards higher index values whichcorrespond to older entries, while HMVP0 which is already in the secondentry at the end of the HMVP table 400 is not shown to be shiftedfurther. In some examples, shifting the HMVP candidates HMVP2 and HMVP4can free up space in the HMVP table 400 for newer HMVP candidates.Accordingly, new HMVP candidates HMVP5 and HMVP6 are shown to be shiftedinto the HMVP table 400, with HMVP6 being the newest or including themost recently decoded motion vector information, and stored in the firstentry at the index value 0.

In some examples, one or more of the HMVP candidates from the HMVP table400 can be used for constructing other candidate lists which can be usedfor motion prediction of the current block. For example, one or moreHMVP candidates from the HMVP table 400 can be added to a mergecandidate list, e.g., as additional merge candidates. In some examples,one or more HMVP candidates from the same HMVP table 400 or another suchHMVP table can be added to an Advanced Motion Vector Prediction (AMVP)candidate list, e.g., as additional AMVP predictors.

For example, in a merge candidate list construction process some or allof the HMVP candidates stored in the entries of the HMVP table 400 canbe inserted in the merge candidate list. In some examples, inserting theHMVP candidates in the merge candidate list can include inserting theHMVP candidates after a temporal motion vector predictor (TMVP)candidate in the merge candidate list. As previously discussed withreference to FIG. 3A and FIG. 3B, the TMVP candidate, if enabled andavailable, can be added into a MV candidate list after spatial motionvector candidates.

In some examples, the above-described pruning process can be applied onthe HMVP candidates in constructing the merge candidate list. Forexample, once a total number of merge candidates in the merge candidatelist reaches maximum number of allowable merge candidates, the mergecandidate list construction process can be terminated, and no more HMVPcandidates may be inserted into the merge candidate list. The maximumnumber of allowable merge candidates in the merge candidate list can bea predetermined number or a number which may be signaled, e.g., from anencoder to a decoder at which the merge candidate list may beconstructed.

In some examples of constructing the merge candidate list, one or moreother candidates can be inserted in the merge candidate list. In someexamples, the motion information of previously coded blocks which maynot be adjacent to the current block can be utilized for more efficientmotion vector prediction. For example, non-adjacent spatial mergecandidates can be used in constructing the merge candidate list. In somecases, the construction of non-adjacent spatial merge candidates (e.g.,described in JVET-K0228, which is hereby incorporated by reference inits entirety and for all purposes) involves derivation of new spatialcandidates from two non-adjacent neighboring positions (e.g. from theclosest non-adjacent block to the left/above, as illustrated in FIG. 5and discussed below). The blocks can be limited within a maximumdistance of 1 CTU to the current block. The fetching process ofnon-adjacent candidates starts with tracing the previous decoded blocksin the vertical direction. The vertical inverse tracing stops when aninter block is encountered or the traced back distance reaches 1 CTUsize. The fetching process traces the previous decoded blocks in thehorizontal direction. The criterion for stopping the horizontal fetchingprocess depends on whether there is a vertical non-adjacent candidatesuccessfully being fetched or not. If no vertical non-adjacent candidateis fetched, the horizontal fetching process stops when an inter isencountered or the traced back distance exceed one CTU size threshold.If there is a vertical non-adjacent candidate fetched, the horizontalfetching process stops when an inter block which contains different MVfrom the vertical non-adjacent candidate is encountered or the tracedback distance exceed on CTU size threshold. In some examples, thenon-adjacent spatial merge candidates can be inserted before the TMVPcandidate in the merge candidate list. In some examples, thenon-adjacent spatial merge candidates can be inserted before the TMVPcandidate in the same merge candidate list which can include one or moreof the HMVP candidates inserted after the TMVP candidate. Identifyingand fetching one or more non-adjacent spatial merge candidates which canbe inserted into the merge candidate list will be described withreference to FIG. 5 below.

FIG. 5 is a block diagram illustrating a picture or slice 500 whichincludes a current block 502 to be coded. In some examples, a mergecandidate list can be constructed for coding the current block 502. Forexample, motion vectors for the current block can be obtained from oneor more merge candidates in the merge candidate list. The mergecandidate list can include determining non-adjacent spatial mergecandidates. For example, the non-adjacent spatial merge candidates caninclude new spatial candidates derived from two non-adjacent neighboringpositions relative to the current block 502.

Several adjacent or neighboring blocks of the current block 502 areshown, including an above left block B₂ 510 (above and to the left ofthe current block 502), an above block B₁ 512 (above the current block502), an above right block B₀ 514 (above and to the right of the currentblock 502), a left block A₁ 516 (to the left of the current block 502),and a left below block A₀ 518 (to the left of and below the currentblock 502). In some examples, the non-adjacent spatial merge candidatescan be obtained from one the closest non-adjacent block above and/or tothe left of the current block.

In some examples, non-adjacent spatial merge candidates for the currentblock 502 can include tracing previously decoded blocks a verticaldirection (above the current block 502) and/or in a horizontal direction(to the left of the current block 502). A vertical traced back distance504 indicates a vertical distance relative to the current block 502(e.g., a top boundary of the current block 502) and a verticalnon-adjacent block V_(N) 520. A horizontal traced back distance 506indicates a horizontal distance relative to the current block 502 (e.g.,a left boundary of the current block 502) and a horizontal non-adjacentblock H_(N) 522. The vertical traced back distance 504 and thehorizontal traced back distance 506 are restrained to a maximum distanceequal to the size of one coding tree unit (CTU).

Non-adjacent spatial merge candidates such as the vertical non-adjacentblock V_(N) 520 and the horizontal non-adjacent block H_(N) 522 can beidentified by tracing the previous decoded blocks in the verticaldirection and the horizontal direction, respectively. For example,fetching the vertical non-adjacent block V_(N) 520 can include avertical inverse tracing process to determine whether an inter codedblock exists within the vertical traced back distance 504 (constrainedto a maximum size of one CTU). If such a block exists, it is identifiedas the vertical non-adjacent block V_(N) 520. In some examples, ahorizontal inverse tracing process may be performed subsequent to thevertical inverse tracing process. The horizontal inverse tracing processcan include determine whether an inter coded block exists within thehorizontal traced back distance 506 (constrained to a maximum size ofone CTU), and if such a block is found, it is identified as thehorizontal non-adjacent block H_(N) 522.

In some examples, one or more of the vertical non-adjacent block V_(N)520 and the horizontal non-adjacent block H_(N) 522 can be fetched foruse as non-adjacent spatial merge candidates. A fetching process caninclude fetching the vertical non-adjacent block V_(N) 520 if thevertical non-adjacent block V_(N) 520 is identified in the verticalinverse tracing process. The fetching process can proceed to thehorizontal inverse tracing process. If the vertical non-adjacent blockV_(N) 520 is not identified in the vertical inverse tracing process, thehorizontal inverse tracing process can be terminated when an inter codedblock is encountered or the horizontal traced back 506 distance exceedsthe maximum distance. If the vertical non-adjacent block V_(N) 520 isidentified and fetched, the horizontal inverse tracing process isterminated when an inter coded block is encountered which contains adifferent MV than the MV contained in the vertical non-adjacent blockV_(N) 520 or if the horizontal traced back 506 distance exceeds themaximum distance. As previously noted, one or more of the fetchednon-adjacent spatial merge candidates such as the vertical non-adjacentblock V_(N) 520 and the horizontal non-adjacent block H_(N) 522 areadded before the TMVP candidate in the merge candidate list.

Referring back to FIG. 4, in some cases, the HMVP candidates can also beused in constructing an AMVP candidate list. In an AMVP candidate listconstruction process, some or all of the HMVP candidates stored in theentries of the same HMVP table 400 (or a different HMVP table than theone used for the merge candidate list construction) can be inserted inthe AMVP candidate list. In some examples, inserting the HMVP candidatesin the AMVP candidate list can include inserting a set of entries (e.g.,a number of k most recent or least recent entries) of the HMVPcandidates after the TMVP candidate in the AMVP candidate list. In someexamples, the above-described pruning process can be applied on the HMVPcandidates in constructing the AMVP candidate list. In some examples,only those HMVP candidates with a reference picture which is the same asan AMVP target reference picture may be used to construct the AMVPcandidate list.

Accordingly, the history-based motion vector predictor (HMVP) predictionmode can involve the use of a history-based lookup table such as theHMVP table 400 which includes one or more HMVP candidates. The HMVPcandidates can be used in inter-prediction modes, such as the merge modeand the AMVP mode. In some examples, different inter-prediction modescan use different methods to select HMVP candidates from the HMVP table400.

In some cases, alternative motion vector prediction designs can be used.For example, alternative designs for spatial MVP (S-MVP) prediction andtemporal MVP (T-MVP) prediction can be utilized. For instance, in someimplementations of merge mode (in some cases merge mode can be referredto as Skip mode or Direct mode), the spatial and temporal MVP candidatesshown in FIG. 6A, FIG. 6B, and FIG. 6C can be visited (or searched orselected) in the given order shown in the figures to fill the MVP list.

FIG. 6A illustrates locations of MVP candidates A, B, (C, A1|B1), A0, B2for current block 600. FIG. 6B illustrates temporally collocatedneighbors at center position 610 with fallback candidates H for currentblock 600. Spatial and temporal locations utilized in MVP prediction areas shown in FIG. 6A. An example of the visiting order (e.g., searchorder or selection order) for S-MVP is shown in FIG. 6C with searchordered blocks 0, 1, 2, 3, 4, and 5. A spatially inverted pattern (ascompared to the order in FIG. 6C) alternative is for search orderedblocks 0-5 is shown in FIG. 6D.

The spatial neighbors utilized as MVP candidates are A, B, (C, A1|B1),A0, B2, which is implemented with a two-stage process, with the visitingorder marked in FIG. 6C:

1. Group 1:

-   -   a. A, B, C (collocated with B0 in HEVC notation)    -   b. A1 or B1, depending on availability of MVP in C location and        type of block partitioning.

2. Group 2:

-   -   a. A0 and B2

Temporally collocated neighbors utilized as MVP candidates are blockcollocated at the center position 610 of the current block and the blockat the most bottom-right location outside of the current block:

1. Group 3:

-   -   a. C, H    -   b. If the H location is found to be outside of the collocated        picture, fall back H positions can be used instead.

In some implementation, depending on the block partitioning used andcoding order, and inverse S-MVP candidates order can be used, as shownin FIG. 6D.

In HEVC and earlier video coding standards, only a translational motionmodel is applied for motion compensation prediction (MCP). For example,a translational motion vector can be determined for each block (e.g.,each CU or each PU) of a picture. However, in the real world, there aremore kinds of motions other than translational motion, including zooming(e.g., zooming in and/or out), rotation, perspective motions, amongother irregular motions. In the Joint exploration model (JEM) by ITU-TVCEG and MPEG, an affine transform motion compensation prediction can beapplied to improve coding efficiency using an affine coding mode.

FIG. 7 is a diagram which illustrates an affine motion field of acurrent block 702 described by two motion vectors shown as vector 720({right arrow over (v)}₀) and vector 722 ({right arrow over (v)}₁) oftwo corresponding control points 510 and 512. Using the motion vector720 {right arrow over (v)}₀ of the control point 710 and the motionvector 722 {right arrow over (v)}₁ of the control point 712, the motionvector field (MVF) of the current block 702 can be described by thefollowing equation:

$\begin{matrix}\{ {\begin{matrix}{v_{x} = {{\frac{( {v_{1x} - v_{0x}} )}{w}x} - {\frac{( {v_{1y} - v_{0\; y}} )}{w}y} + v_{0x}}} \\{v_{y} = {{\frac{( {v_{1y} - v_{0y}} )}{w}x} + {\frac{( {v_{1x} - v_{0x}} )}{w}y} + v_{0y}}}\end{matrix},}  & {{Equation}\mspace{14mu}(1)}\end{matrix}$

In equation (1), v_(x) and v_(y) form the motion vector for each pixelwithin the current block 702, x and y provide the position of each pixelwithin the current block 702 (e.g., the top-left pixel in a block canhave coordinate or index (x, y)=(0,0)), and (v_(0x), v_(0y)) is themotion vector of the top-left corner control point 710, w is the widthof the current block 702, and (v_(1x), v_(1y)) is the motion vector 722of the top-right corner control point 712. The v_(0x) and v_(1x) valuesare horizontal values for the respective motion vectors, and v_(0y) andv_(1y) values are the vertical values for the respective motion vectors.Additional control points (e.g., four control points, six controlpoints, eight control points, or some other number of control points)can be defined by adding additional control point vectors, for exampleat the lower corners of the current block 702, the center of the currentblock 702, or other position in the current block 702.

Equation (1) above illustrates a 4-parameters motion model, where thefour affine parameters a, b, c, and d are defined as:

${{{a = \frac{( {v_{1x} - v_{ox}} )}{w}};{b = \frac{( {v_{1y} - v_{0y}} )}{w}}},{c = v_{0x}}};$and d=v_(0y). Using equation (1), given the motion vector (v_(0x),v_(0y)) of the top-left corner control point 710 and the motion vector(v_(1x), v_(1y)) of the top-right corner control point 712, the motionvector for every pixel of the current block can be calculated using thecoordinate (x, y) of each pixel location. For instance, for the top-leftpixel position of the current block 702, the value of (x, y) can beequal to (0, 0), in which case the motion vector for the top-left pixelbecomes V_(x)=v_(0x) and V_(y)=v_(0y). In order to further simplify theMCP, block-based affine transform prediction can be applied.

FIG. 8 is a diagram which illustrates block-based affine transformprediction of a current block 802 (e.g., which can be similar to currentblock 600 or current block 702) divided into sub-blocks, includingillustrated sub-blocks 804, 806, and 808. The example shown in FIG. 8includes a 4×4 partition, with sixteen total sub-blocks. Any suitablepartition and corresponding number of sub-blocks can be used in otherexamples. A motion vector can be derived for each sub-block usingequation (1). In some examples, derive a motion vector of each the 4×4sub-blocks, the motion vector of the center sample of each sub-block (asshown in FIG. 8) is calculated according to equation (1), as illustratedby motion vector 805 derived for sub-block 804, motion vector 807derived for sub-block 806, and motion vector 809 derived for sub-block808, each from a center sample of the corresponding sub-block. In otherexamples, other samples can be used. In some examples, each resultingmotion vector can be rounded, for example to a 1/16 fraction accuracy orother suitable accuracy (e.g., ¼, ⅛, or the like). Motion compensationcan be applied using the derived motion vectors of the sub-blocks togenerate the prediction of each sub-block. For example a decoding devicecan receive the four affine parameters (a, b, c, d) describing themotion vectors {right arrow over (v)}₀ 820 the control point 810 and themotion vector {right arrow over (v)}₁ 822 of the control point 812, andcan calculate the per-sub-block motion vector according to the pixelcoordinate index describing the location of the center sample of eachsub-block. After MCP, the high accuracy motion vector of each sub-blockcan be rounded, as noted above, and can be saved as the same accuracy asthe translational motion vector. In addition, in some examples, themotion vectors in an affine mode can be limited to limit the referencedata to be used during affine coding operations that use the motionvectors as affine motion vectors in an affine coding mode. In some suchexamples, clipping can be applied to such vectors, as described in moredetail below, particularly with respect to FIGS. 18A, 18B, and 18C.

FIG. 9 is a diagram illustrating an example of motion vector predictionin affine inter (AF_INTER) mode. In JEM, there are two affine motionmodes: affine inter (AF_INTER) mode and affine merge (AF_MERGE) mode. Insome examples, when a CU has a width and height larger than 8 pixels,AF_INTER mode can be applied. An affine flag can be placed (or signaled)in the bitstream in relation to a block (e.g., at the CU level), toindicate whether AF_INTER mode was applied to the block. In the exampleof FIG. 9, in AF_INTER mode, a candidate list of motion vector pairs canbe constructed using neighboring blocks. For example, for a sub-block910, located in the upper left corner of a current block 902, a motionvector v₀ can be selected from a neighboring block 920 above and to theleft of the sub-block 910, neighboring block B 922 above the sub-block910, and neighboring block C 924 to the left of the sub-block 910. As afurther example, for a sub-block 912, located in the upper right cornerof the current block 902, a motion vector v₁ can be selected fromneighboring block D 926 and neighboring block E 928 in the above and theabove-right directions, respectively. A candidate list of motion vectorpairs can be constructed using the neighboring blocks. For example,given motion vectors v_(A), v_(B), v_(C), v_(D), and v_(E) correspondingto blocks A 920, B 922, C 924, D 926, and E 928, respectively, thecandidate list of motion vector pairs can be expressed as {(v₀,v₁)|v₀={v_(A), v_(B), v_(C)}, v₁={v_(D), v_(E)}}.

As noted above and as shown in FIG. 9, in AF_INTER mode, the motionvector v₀ can be selected from the motion vectors of the blocks A 920, B922, or C 924. The motion vector from the neighboring block (e.g., blockA, B, or C) can be scaled according to the reference list and therelationship among the POC of the reference for the neighboring block,the POC of the reference for the current CU (e.g., the current block902), and the POC of the current CU. In these examples, some or all ofthe POCs can be determined from a reference list. Selection of v₁ fromthe neighboring blocks D 926 or E 928 is similar to the selection of v₀.

In some cases, if the number of candidate lists is less than two, thecandidate list can be padded with motion vector pairs by duplicatingeach of the AMVP candidates. When the candidate list is larger than two,in some examples, the candidates in the candidate list can first besorted according to the consistency of the neighboring motion vectors(e.g., consistency can be based on the similarity between the two motionvectors in a motion vector pair candidate). In such examples, the firsttwo candidates are kept and the rest may be discarded.

In some examples, a rate-distortion (RD) cost check can used todetermine which motion vector pair candidate is selected as the controlpoint motion vector prediction (CPMVP) of the current CU (e.g., thecurrent block 902). In some cases, an index indicating the position ofthe CPMVP in the candidate list can be signaled (or otherwise indicated)in the bitstream. Once the CPMVP of the current affine CU is determined(based on the motion vector pair candidate), affine motion estimationcan be applied, and the control point motion vector (CPMV) can bedetermined. In some cases, the difference of the CPMV and the CPMVP canbe signalled in the bitstream. Both CPMV and CPMVP include two sets oftranslational motion vectors, in which case the signaling cost of affinemotion information is higher than that of translational motion.

FIG. 10A and FIG. 10B illustrate an example of motion vector predictionin AF_MERGE mode. When a current block 802 (e.g., a CU) is coded usingAF_MERGE mode, a motion vector can be obtained from a valid neighboringreconstructed block. For example, the first block from the validneighbor reconstructed blocks that is coded with affine mode can beselected as the candidate block. As shown in FIG. 10A, the neighboringblock can be selected from among a set of neighboring blocks A 1020, B1022, C 1024, D 1026, and E 1028. The neighboring blocks may beconsidered in a particular selection order for being selected as thecandidate block. One example of a selection order is the left neighbor(e.g., block A 1020), followed by the above neighbor (block B 1022), theabove right neighbor (block C 1024), the left bottom neighbor (block D1026), and the above left neighbor (block E 1028).

As noted above, the neighboring block that is selected can be the firstblock (e.g., in the selection order) that has been coded with affinemode. For example, block A 820 may have been coded in affine mode. Asillustrated in FIG. 10B, block A 1020 can be included in a neighboringCU 1004. For the neighboring CU 1004, motion vectors for the top leftcorner (v₂ 1030), above right corner (v₃ 1032), and left bottom corner(v₄ 1034) of the neighboring CU 1004 may have been derived. In the aboveexample, a control point motion vector, v₀ 1040, for the top left cornerof the current block 1002 is calculated according to v₂ 1030, v₃ 1032,and v₄ 1034. The control point motion vector, v₁ 1042, for the top rightcorner of the current block 1002 can be determined.

Once the control point motion vectors (CPMV), v₀ 1040 and v₁ 1042, ofthe current block 1002 have been derived, equation (1) can be applied todetermine a motion vector field for the current block 1002. In order toidentify whether the current block 1002 is coded with AF_MERGE mode, anaffine flag can be included in the bitstream when there is at least oneneighboring block coded in affine mode.

In many cases, the process of affine motion estimation includesdetermining affine motion for a block at the encoder side by minimizingthe distortion between the original block and the affine motionpredicted block. As affine motion has more parameters than translationalmotion, affine motion estimation can be more complicated thantranslational motion estimation. In some cases, a fast affine motionestimation method based on Taylor expansion of signal can be performedto determine the affine motion parameters (e.g., affine motionparameters a, b, c, d in a 4-parameters model).

The fast affine motion estimation can include a gradient-based affinemotion search. For example, given a pixel value I_(t) at time t (with t0being the time of the reference picture), the first order Taylorexpansion for the pixel value I_(t) can be determined as:

$\begin{matrix}{I_{t} = {{I_{t\; 0} + {\frac{\partial I_{t\; 0}}{\partial t}( {t - {t0}} )}} = {I_{t\; 0} + {\frac{\partial I_{t0}}{\partial x} \cdot \frac{\partial x}{\partial t} \cdot ( {t - t_{0}} )} + {\frac{\partial I_{t0}}{\partial y} \cdot \frac{\partial y}{\partial t} \cdot ( {t - t_{0}} )}}}} & {{Equation}\mspace{14mu}(2)}\end{matrix}$

Where

$\frac{\partial I_{t0}}{\partial x}\mspace{14mu}{and}\mspace{14mu}\frac{\partial I_{t0}}{\partial y}$are the pixel gradient G_(0x), G_(0y) in the x and y directions,respectively, while

${\frac{\partial x}{\partial t} \cdot ( {t - t_{0}} )}\mspace{14mu}{and}\mspace{14mu}{\frac{\partial y}{\partial t} \cdot ( {t - t_{0}} )}$indicate the motion vector components V_(x) and V_(y) for the pixelvalue I_(t). The motion vector for the pixel I_(t) in the current blockpoints to a pixel I_(to) in the reference picture.

The equation (2) can be rewritten as equation (3) as follows:I _(t) =I _(to) +G _(x0) ·V _(x) +G _(y0) ·V _(y)  Equation (3)

The affine motion V_(x) and V_(y) for the pixel value I_(t) can besolved by minimizing the distortion between the prediction(I_(to)+G_(x0)·V_(x)+G_(y0)·V_(y)) and the original signal. Taking4-parameters affine model as an example,V _(x) =a·x−b·y+c  Equation (4)V _(y) =b·x+a·y+d,  Equation (5)

where x and y indicate the position of a pixel or sub-block. Takingequations (4) and (5) into equation (3), and minimizing the distortionbetween original signal and the prediction using equation (3), thesolution of affine parameters a, b, c, d can be determined:{a,b,c,d}=arg min{Σ_(i∈current template)(I _(t) ^(i) −I _(t0) ^(i) −G_(x0) ^(i)·(a·x+b·y+c)−G _(y0) ^(i)·(b·x−a·y+d))²}   Equation (6)

Once the affine motion parameters are determined, which define theaffine motion vectors for the control points, the per-pixel orper-sub-block motion vectors can be determined using the affine motionparameters (e.g., using equations (4) and (5), which are alsorepresented in equation (1)). Equation (3) can be performed for everypixel of a current block (e.g., a CU). For example, if a current blockis 16 pixels×16 pixels, the least squares solution in equation (6) canbe used to derive the affine motion parameters (a, b, c, d) for thecurrent block by minimizing the overall value over the 256 pixels.

Any number of parameters can be used in affine motion models for videodata. For instance, a 6-parameters affine motion or other affine motioncan be solved in the same way as that described above for the4-parameters affine motion model. For example, a 6-parameters affinemotion model can be described as:

$\begin{matrix}\{ {\begin{matrix}{v_{x} = {{ax} + {by} + e}} \\{v_{y} = {{cx} + {dy} + f}}\end{matrix},}  & {{Equation}\mspace{14mu} 7}\end{matrix}$

In equation (7), (v_(x), v_(y)) is the motion vector at the coordinate(x, y), and a, b, c, d, e, and f are the six affine parameters. Theaffine motion model for a block can also be described by the threemotion vectors (MVs) {right arrow over (v)}₀=(v_(0x), v_(0y)), {rightarrow over (v)}₁=(v_(1x), v_(1y)), and {right arrow over (v)}₂=(v_(2x),v_(2y)) at three corners of the block.

FIG. 11 is a diagram which illustrates an affine motion filed of acurrent block 1102 described by three motion vectors 1120, 1122, and1124 at three corresponding control points 1110, 1112, and 1114. Themotion vector 1120 (e.g., {right arrow over (v)}₀) is at the controlpoint 1110 located at the top-left corner of the current block 1102, themotion vector 1122 (e.g., {right arrow over (v)}₁) is at the controlpoint 1112 located at the top-right corner of the current block 1102,and the motion vector 1124 (e.g., {right arrow over (v)}₂) is at thecontrol point 1114 located at the bottom-left corner of the currentblock 1102. The motion vector field (MVF) of the current block 1102 canbe described by the following equation:

$\begin{matrix}{\{ \begin{matrix}{v_{x} = {{\frac{( {v_{1x} - v_{0x}} )}{w}x} + {\frac{( {v_{2x} - v_{0x}} )}{h}y} + v_{0x}}} \\{v_{y} = {{\frac{( {v_{1y} - v_{0y}} )}{w}x} + {\frac{( {v_{2y} - v_{0y}} )}{h}y} + v_{0y}}}\end{matrix} ,} & {{Equation}\mspace{14mu}(8)}\end{matrix}$

Equation (8) represents a 6-parameters affine motion model where w and hare the width and height of the current block 1102.

While a 4-parameters motion model was described with reference toequation (1) above, a simplified 4-parameters affine model using thewidth and the height of the current block can be described by thefollowing equation:

$\begin{matrix}\{ \begin{matrix}{v_{x} = {{ax} - {by} + e}} \\{v_{y} = {{bx} + {ay} + f}}\end{matrix}  & {{Equation}\mspace{14mu}(9)}\end{matrix}$

The simplified 4-parameters affine model for a block based on equation(9) can be described by two motion vectors {right arrow over(v)}₀=(v_(0x), v_(0y)) and {right arrow over (v)}₁=(v_(1x), v_(1y)) attwo of four corners of the block. The motion field can be described as:

$\begin{matrix}\{ \begin{matrix}{v_{x} = {{\frac{( {v_{1x} - v_{0x}} )}{w}x} - {\frac{( {v_{1y} - v_{0y}} )}{h}y} + v_{0x}}} \\{v_{y} = {{\frac{( {v_{1y} - v_{0y}} )}{w}x} + {\frac{( {v_{1x} - v_{0x}} )}{h}y} + v_{0y}}}\end{matrix}  & {{Equation}\mspace{14mu}(10)}\end{matrix}$

As previously mentioned, the motion vector {right arrow over (v)}_(i) isreferred to herein as a control point motion vector (CPMV). The CPMVsfor the 4-parameters affine motion model are not necessarily the same asthe CPMVs for the 6-parameters affine motion model. In some examples,different CPMVs can be selected for the affine motion model.

FIG. 12 is a diagram which illustrates selection of control pointvectors for an affine motion model of a current block 1202. Four controlpoints 1210, 1212, 1214, and 1216 are illustrated for the current block1202. The motion vector 1220 (e.g., {right arrow over (v)}₀) is at thecontrol point 1210 located at the top-left corner of the current block1202, the motion vector 1222 (e.g., {right arrow over (v)}₁) is at thecontrol point 1212 located at the top-right corner of the current block1202, the motion vector 1224 (e.g., {right arrow over (v)}₂) is at thecontrol point 1214 located at the bottom-left corner of the currentblock 1202, and the motion vector 1226 (e.g., {right arrow over (v)}₃)is at the control point 1216 located at the bottom-right corner of thecurrent block 1202.

In an example, for a 4-parameters affine motion model (according toeither equation (1) or equation (10)), control point pairs can beselected from any two of the four motion vectors {{right arrow over(v)}₀, {right arrow over (v)}₁, {right arrow over (v)}₂, {right arrowover (v)}₃}. In another example, for a 6-parameters affine motion model,the control point pairs can be selected from any three of the fourmotion vectors {{right arrow over (v)}₀, {right arrow over (v)}₁, {rightarrow over (v)}₂, {right arrow over (v)}₃}. Based on the selectedcontrol point motion vectors, the other motion vectors for the currentblock 1002 can be calculated, for example, using the derived affinemotion model.

In some examples, alternative affine motion model representations canalso be used. For instance, an affine motion model based on delta motionvectors can be represented by an anchor motion vector {right arrow over(v)}₀ at a coordinate (x₀, y₀), a horizontal delta motion vector ∇{rightarrow over (v)}_(h), and a vertical delta motion vector ∇{right arrowover (v)}_(v). In general, a motion vector {right arrow over (v)} at thecoordinate (x, y) can be calculated as {right arrow over (v)}={rightarrow over (v)}₀+x*∇{right arrow over (v)}_(h)+y*∇{right arrow over(v)}_(v).

In some examples, the affine motion model representation based on CPMVscan be converted to the alternative affine motion model representationwith delta motion vectors. For example, {right arrow over (v)}₀ in thedelta motion vector affine motion model representation is the same asthe top-left CPMV, ∇{right arrow over (v)}_(h)=({right arrow over(v)}₁−{right arrow over (v)}₀)/w, ∇{right arrow over (v)}_(v)=({rightarrow over (v)}₂−{right arrow over (v)}₀/h. It is to be noted that forthese vector operations, the addition, division, and multiplication areapplied element wise.

In some examples, affine motion vector prediction can be performed usingaffine motion predictors. In some examples, the affine motion predictorsfor a current block can be derived from the affine motion vectors ornormal motion vectors of the neighboring coded blocks. As describedabove, the affine motion predictors can include inherited affine motionvector predictors (e.g., inherited using affine merge (AF_MERGE) mode)and constructed affine motion vector predictors (e.g., constructed usingaffine inter (AF_INTER) mode).

An inherited affine motion vector predictor (MVP) uses one or moreaffine motion vectors of a neighboring coded block to derive thepredicted CPMVs of a current block. The inherited affine MVP is based onan assumption that the current block shares the same affine motion modelas the neighboring coded block. The neighboring coded block is referredto as a neighboring block or a candidate block. The neighboring blockcan be selected from different spatial or temporal neighboringlocations.

FIG. 13 is a diagram illustrating an inherited affine MVP of a currentblock 1302 from a neighboring block 1302 (block A). The affine motionvectors of the neighboring block 1302 are represented in terms of therespective motion vectors 1330, 1332, and 1334 {{right arrow over (v)}₀,{right arrow over (v)}₁, {right arrow over (v)}₂} at the control points1320, 1322, and 1324 as follows: {right arrow over (v)}₀=(v_(0x),v_(0y)), {right arrow over (v)}₁=(v_(1x), v_(1y)), {right arrow over(v)}₂=(v_(2x), v_(2y)). In an example, the size of the neighboring block1304 can be represented by the parameters (w, h) where w is the widthand h is the height of the neighboring block 1304. The coordinates ofcontrol points of the neighboring block 1304 are represented as (x0,y0), (x1, y1), and (x2, y2). The affine motion vectors 1340, 1342, and1344 represented as {right arrow over (v)}′₀=(v_(0x)′, v_(0y)′), {rightarrow over (v)}′₀=(v_(1x)′, v_(1y)′), {right arrow over (v)}′₂=(v_(2x)′,v_(2y)′) can be predicted for the current block 1302 at the respectivecontrol points 1310, 1312, and 1314. The predicted affine motion vectors{right arrow over (v)}′₀=(v_(0x)′, v_(0y)′), {right arrow over(v)}′₁=(v_(1x)′, v_(1y)′), {right arrow over (v)}′₂=(v_(2x)′, v_(2y)′)for the current block 1302 can be derived by replacing (x, y) inequation (8) with the coordinate difference between the control pointsof the current block 1302 and the top-left control point of theneighboring block 1304 as described in the following equations:

$\begin{matrix}\{ \begin{matrix}{v_{0x}^{\prime} = {{\frac{( {v_{1x} - v_{0x}} )}{w}( {{x\; 0^{\prime}} - {x\; 0}} )} + {\frac{( {v_{2x} - v_{0x}} )}{h}( {{y\; 0^{\prime}} - {y\; 0}} )} + v_{0x}}} \\{v_{0y}^{\prime} = {{\frac{( {v_{1y} - v_{0y}} )}{w}( {{x\; 0^{\prime}} - {x\; 0}} )} + {\frac{( {v_{2y} - v_{0y}} )}{h}( {{y\; 0^{\prime}} - {y\; 0}} )} + v_{0y}}}\end{matrix}  & {{Equation}\mspace{14mu}(11)} \\\{ \begin{matrix}{v_{1x}^{\prime} = {{\frac{( {v_{1x} - v_{0x}} )}{w}( {{x\; 1^{\prime}} - {x\; 0}} )} + {\frac{( {v_{2x} - v_{0x}} )}{h}( {{y\; 1^{\prime}} - {y\; 0}} )} + v_{0x}}} \\{v_{1y}^{\prime} = {{\frac{( {v_{1y} - v_{0y}} )}{w}( {{x\; 1^{\prime}} - {x\; 0}} )} + {\frac{( {v_{2y} - v_{0y}} )}{h}( {{y\; 1^{\prime}} - {y\; 0}} )} + v_{0y}}}\end{matrix}  & {{Equation}\mspace{14mu}(12)} \\\{ \begin{matrix}{v_{2x}^{\prime} = {{\frac{( {v_{1x} - v_{0x}} )}{w}( {{x\; 2^{\prime}} - {x\; 0}} )} + {\frac{( {v_{2x} - v_{0x}} )}{h}( {{y\; 2^{\prime}} - {y\; 0}} )} + v_{0x}}} \\{v_{2y}^{\prime} = {{\frac{( {v_{1y} - v_{0y}} )}{w}( {{x\; 2^{\prime}} - {x\; 0}} )} + {\frac{( {v_{2y} - v_{0y}} )}{h}( {{y\; 2^{\prime}} - {y\; 0}} )} + v_{0y}}}\end{matrix}  & {{Equation}\mspace{14mu}(13)}\end{matrix}$

In equations (11)-(13), (x0′, y0′), (x1′, y1′), and (x2′, y2′) are thecoordinates of control points of the current block 1102. If representedas delta MVs, {right arrow over (v)}′₀=v_(0x)+(x0′−x0)*∇{right arrowover (v)}_(h)+(y0′−y0)*∇{right arrow over (v)}_(v), and {right arrowover (v)}′₁=v_(0x)+(x1′−x0)*∇{right arrow over (v)}_(h)+(y1′−y0)*∇{rightarrow over (v)}_(v), {right arrow over (v)}′₂=v_(0x)+(x2′−x0)*∇{rightarrow over (v)}_(h)+(y2′−y0)*∇{right arrow over (v)}_(v).

Similarly, if the affine motion model of the neighboring coded block(e.g., the neighboring block 1304) is a 4-parameters affine motionmodel, the equation (10) can be applied in deriving the affine motionvectors at the control points for the current block 1102. In someexamples, using the equation (10) for obtaining the 4-parameters affinemotion model can include avoiding the equation (13) above.

FIG. 14 is a diagram which illustrates possible locations for aneighboring candidate block for use in the inherited affine MVP modelfor a current block 1402. For example, the affine motion vectors 1440,1442, and 1444 or {{right arrow over (v)}₀, {right arrow over (v)}₁,{right arrow over (v)}₂} at the control points 1410, 1412, and 1414 ofthe current block can be derived from one of the neighboring blocks 1430(block A0), 1426 (block B0), 1428 (block B1), 1432 (block A1) and/or1420 (block B2). In some cases the neighboring blocks 1424 (block A2)and/or 1422 (block B3) can also be used. More specifically, the motionvector 1440 (e.g., {right arrow over (v)}₀) at the control point 1410located at the top left corner of the current block 1402 can beinherited from the neighboring block 1420 (block B2) located above andto the left of the control point 1410, the neighboring block 1422 (blockB3) located above the control point 1410, or from the neighboring block1424 (block A2) located to the left of the control point 1410; themotion vector 1442 or {right arrow over (v)}₁ at the control point 1412located at the top right corner of the current block 1402 can beinherited from the neighboring block 1426 (block B0) located above thecontrol point 1410 or the neighboring block 1428 (block B1) locatedabove and to the right of the control point 1410; and the motion vector1444 or {right arrow over (v)}₂₁ at the control point 1414 located atthe bottom left corner of the current block 1402 can be inherited fromthe neighboring block 1430 (block A0) located to the left of the controlpoint 1410 or the neighboring block 1432 (block A1) located to the leftand below the control point 1410.

Currently, in some designs (e.g., in MPEG5 Essential Video Coding(EVC)), when the affine inheritance is from an affine coded neighboringblock in the above CTU row, the bottom-left and bottom-right sub-blockMVs are adopted as the CPMV, and the 4-parameter affine model is alwaysused to derive the CPMV of the current CU.

FIG. 15 is a diagram illustrating an affine model in MPEG5 EVC andspatial neighborhood. FIG. 15 illustrates current CTU 1500 with aneighboring candidate block 1540 having sub-blocks 1542 and 1544, aswell as a left neighbor sub-block 1552 and a below-left neighborsub-block 1554. While FIG. 15 provides an illustrative example using aCTU, in other examples, the current CTU can be another block, such as aCU, a PU, a TU, etc. The current CTU 1500 includes a current block 1502with control points 1510, 1512, and 1514 and associated CPMVs 1520,1522, and 1524. The top-left CPMV 1520 is referred to as {right arrowover (v)}₀ and the top-right CPMV 1522 shown as {right arrow over (v)}₁.In the illustrated example, the CPMVs 1520 and 1522 are designated asthe CPMVs of the current CU (e.g., the current block 1502) that are tobe derived by the motion vector for the bottom-left sub-block 1542having an associated MV {right arrow over (v)}_(LB) (not shown) and thebottom-right sub-block 1544 having motion vector {right arrow over(v)}_(RB) (not shown) of the neighboring affine coded CU includingcandidate block 1540 located above current CTU 1500. The CPMVs 1520 and1522 are shown as {right arrow over (v)}₀ and {right arrow over (v)}₁ inFIG. 15, and can be derived by the following equation:

$\begin{matrix}{\mspace{79mu}{{{\overset{arrow}{v}}_{0} = {{\frac{{\overset{arrow}{v}}_{RB} - {\overset{arrow}{v}}_{LB}}{neiW}*( {{posCurX} - {posNeiX}} )} + {\overset{arrow}{v}}_{LB}}}{{\overset{arrow}{v}}_{1} = {{\frac{{\overset{arrow}{v}}_{RB} - {\overset{arrow}{v}}_{LB}}{neiW}*( {{posCurX} + {curW} - {posNeiX}} )} + {\overset{arrow}{v}}_{LB}}}}} & {{Equation}\mspace{14mu}(14)}\end{matrix}$where neiW is the width of the neighboring block, curW is the width ofcurrent block, posNeiX is the x coordinate of the top-left pixel (orsample in some examples) of the neighboring block, and posCurX is the xcoordinate of the top-left pixel (or sample in some examples) of thecurrent block.

There are three affine prediction motion modes in some cases: AF_4_INTERmode, AF_6_INTER, and AF_MERGE mode. When a merge/skip flag is true(e.g., equal to a value of 1), and both width and height for the CU arelarger than or equal to 8 samples (or other number of samples), anaffine flag at the CU level (or other block level) is signalled in thebitstream to indicate whether affine merge mode is used. And when the CUis coded as AF_MERGE, the merge candidate index with maximum value 4 (orother value in some cases) is signalled for specifying which motioninformation candidate in the affine merge candidate list is used for theCU.

The affine merge candidate list can be constructed as followingsteps: 1) Insert model based affine candidates, where a model basedcandidate is derived from the affine motion model of its valid spatialneighboring affine coded block. The scan order for the candidatepositions can be identical to the merge list order in FIG. 6A, FIG. 6B,and/or FIG. 6C, and includes positions from 0 to 5. 2) Insert controlpoint based affine candidates. If the limit on the affine merge listsize is not met, control point based affine candidates are inserted. Acontrol point based affine candidate means the candidate is constructedby combining the neighboring motion information of each control point toform an affine merge candidate.

A total number of 4 control points or CPs (denoted as CP1-CP4) are usedwith coordinates (0, 0), (W, 0), (H, 0) and (W, H), respectively, whereW and H are the width and height of the current block.

To simplify the complexity of the affine merge list constructionprocess, no scaling is performed when deriving the control point basedaffine merge candidate. If the control point motion vectors are pointingto different reference indices or the reference index is invalid, thecandidate will be considered as unavailable.

When the merge/skip flag is false (e.g., equal to a value of 0), andboth the width and height for the CU are larger than or equal to 16samples (or other number of samples in some cases), an affine flag atthe CU level is signalled in the bitstream to indicate whether affineinter mode is used (e.g., AF_4_INTER mode or AF_6_INTER mode). When theCU is coded as affine inter mode, a model flag is signalled forspecifying whether 4-parameter or 6-parameter affine model is used forthe CU. If the model flag is true (e.g., equal to a value of 1),AF_6_INTER mode (6-parameter affine model) is applied and 3 MVDs will beparsed; otherwise, if the model flag is false (e.g., equal to a value of0), AF_4_INTER mode (4-parameter affine model) is applied and 2 MVDswill be parsed.

The affine AMVP candidate list can be constructed as following steps: 1)Insert model based affine candidates; 2) Insert control point basedaffine candidates; 3) Insert translational based affine AMVP candidate;and 4) Padding with zero motion vectors.

If the number of candidates in affine merge candidate list is less than2 (or other value in some cases), zero motion vectors with zeroreference indices are inserted until the list is full. To reduce thecomplexity of the list construction, no pruning is applied.

Sample derivation affine mode for small block sizes (e.g., 4×8 and 8×4sizes) can be performed. In MPEG5 EVC, the minimal block size for affinecoding is set to be equal 8×8. However, an encoder can select toimplement affine prediction in sub-block sizes of 4×8 or 8×4. MPEG5 EVCspecifies affine prediction for such sub-block sizes through an enhancedinterpolation filter (EIF). EIF enables affine prediction withper-sample prediction, computing a motion vector independently for eachsample. To prevent MV pointing outside of the reference picture, aresulting MV for each sample is clipped to the picture size. An extractof the MPEG EVC below shows implementation of affine prediction withEIF, marked in underlined text in between “<highlight>” and“<highlightend>” symbols (e.g., “<highlight>highlightedtext<highlightend>”):

-   -   If an affine_flag is equal to 1 and one of the variables        sbWidth, sbHeight is less than 8, the following applies:        -   Horizontal change of motion vector dX, vertical change of            motion vector dY and base motion vector mvBaseScaled are            derived by invoking the process specified in clause 8.5.3.9            with the luma coding block width nCbW, the luma coding block            height nCbH, number of control point motion vectors numCpMv            and the control point motion vectors cpMvLX[cpIdx] with            cpIdx=0 . . . numCpMv−1 as inputs.        -   The array predSamplesLX_(L) is derived by invoking            interpolation process for enhanced interpolation filter            specified in clause 8.5.4.3 with the luma locations (xSb,            ySb), the luma coding block width nCbW, the luma coding            block height nCbH, horizontal change of motion vector dX,            vertical change of motion vector dY, base motion vector            mvBaseScaled, the reference array refPicLX_(L), sample            bitDepth bitDepth_(Y), picture width            pic_width_in_luma_samples and height            pic_height_in_luma_samples as inputs.

1.1.1.1 Derivation Process for Affine Motion Model Parameters fromControl Point Motion Vectors

Inputs to the process are:

-   -   two variables cbWidth and cbHeight specifying the width and the        height of the luma coding block,    -   the number of control point motion vectors numCpMv,    -   the control point motion vectors cpMvLX[cpIdx], with cpIdx=0 . .        . numCpMv−1 and X being 0 or 1.

Outputs of the process are:

-   -   horizontal change of motion vector dX,    -   vertical change of motion vector dY,    -   motion vector mvBaseScaled corresponding to the top left corner        of the luma coding block.

The variables log2CbW and log2CbH are derived as follows:log2CbW=Log2(cbWidth)  (8-688)log2CbH=Log2(cbHeight)  (8-689)

Horizontal change of motion vector dX is derived as follows:dX[0]=(cpMvLX[1][0]−cpMvLX[0][0])<<(7−log2CbW)  (8-690)dX[1]=(cpMvLX[1][1]−cpMvLX[0][1])<<(7−log2CbW)  (8-691)

Vertical change of motion vector dY is derived as follows:

-   -   If numCpMv is equal to 3, dY is derived as follow:        dY[0]=(cpMvLX[2][0]−cpMvLX[0][0])<<(7−log2CbH)  (8-692)        dY[1]=(cpMvLX[2][1]−cpMvLX[0][1])<<(7−log2CbH)  (8-693)    -   Otherwise (numCpMv is equal to 2), dY is derived as follows:        dY[0]=−dX[1]  (8-694)        dY[1]=dX[0]  (8-695)

Motion vector mvBaseScaled corresponding to the top left corner of theluma coding block is derived as follows:mvBaseScaled[0]=cpMvLX[0][0]<<7  (8-696)mvBaseScaled[1]=cpMvLX[0][1]<<7  (8-697)

1.1.1.2 Interpolation Process for the Enhanced Interpolation Filter

Inputs to the process are:

-   -   a location (xCb, yCb) in full-sample units,    -   two variables cbWidth and cbHeight specifying the width and the        height of the current coding block,    -   horizontal change of motion vector dX,    -   vertical change of motion vector dY,    -   motion vector mvBaseScaled,    -   the selected reference picture sample arrays refPicLX,    -   sample bit depth bitDepth    -   width of the picture in samples pic_width,    -   height of the picture in samples pic_height.

Outputs of the process are:

-   -   an (cbWidth)×(cbHeight) array predSamplesLX of prediction sample        values.

The variables shift1, shift2, shift3, offset1, offset2 and offset3 arederived as follows:

shift0 is set equal to bitDepth−6, offset0 is equal to 2^(shift1-1),

shift1 is set equal to 11, offset1 is equal to 1024.

<highlight>For x=−1 . . . cbWidth and y=−1 . . . cbHeight, the followingapplies:

-   -   The motion vector mvX is derived as follows:        mvX[0]=(mvBaseScaled[0]+dX[0]*x+dY[0]*y)  (8-728)        mvX[1]=(mvBaseScaled[1]+dX[1]*x+dY[1]*y)  (8-729)<highlightend>    -   The variables xInt, yInt, xFrac and yFrac are derived as        follows:        xInt=xCb+(mvX[0]>>9)+x  (8-730)        yInt=yCb+(mvX[1]>>9)+y  (8-731)        xFrac=mvX[0] & 511  (8-732)        yFrac=mvX[1] & 511  (8-733)

The locations (xInt, yInt) inside the given array refPicLX are derivedas follows:<highlight>xInt=Clip3(0,pic_width−1,xInt)  (8-734)yInt=Clip3(0,pic_height−1,yInt)  (8-735)<highlightend>

The variables a_(x,y), a_(x+1,y), a_(x,y+1), a_(x+1,y+1) are derived asfollows:a_(x,y)=((refPicLX[xInt][yInt]*(512−xFrac)+offset0)>>shift0)*(512−yFrac)  (8-736)a_(x+1,y)=((refPicLX[xInt+1][yInt]*xFrac+offset0)>>shift0)*(512−yFrac)  (8-737)a_(x,y+1)=((refPicLX[xInt][yInt+1]*(512−xFrac)+offset0)>>shift0)*yFrac  (8-738)a_(x+1,y+1)=(((refPicLX[xInt][yInt]*xFrac+offset0)>>shift0)*yFrac  (8-739)

The sample value b_(x,y) corresponding to location (x, y) is derived asfollows:b _(x,y)=(a _(x,y) +a _(x+1,y) +a _(x,y+1) +a_(x+1,y+1)+offset1)>>shift1  (8-740)

The enhancement interpolation filter coefficients eF[ ] are specified as{−1, 10, −1}.

The variables shift2, shift3, offset2 and offset3 are derived asfollows:

shift2 is set equal to 4, offset2 is equal to 8,

shift3 is set equal to 15−bitDepth, offset3 is equal to 2^(shift3-1),

For x=0 . . . cbWidth−1 and y=−1 . . . cbHeight, the following applies:h _(x,y)=(eF[0]*b _(x−1,y) +eF[1]*b _(x,y) +eF[2]*b_(x+1,y)+offset2)>>shift2  (8-741)

For x=0 . . . cbWidth−1 and y=0 . . . cbHeight−1, the following applies:

-   -   predSamplesLX_(L)[x][y]=Clip3(0, (1<<bitDepth)−1,        (eF[0]*h_(x,y−1)+eF[1]*h_(x,y)+eF[2]*b_(x,y+1)+offset3)>>shift3)

Per-sample MV generation introduced in enhanced interpolation filter(EIF) can potentially increase the number of memory accesses needed tofetch filter samples, thus increasing memory bandwidth. The increase inthe number of memory accesses can be much higher than the typically-used1 memory fetching for block sizes 4×8 or 8×4 (or other block sizes) inuni-prediction or 2 memory fetching for bi-predicted blocks.

As described above, a large number of fetching operations may not be aproblem, such as if the needed reference area is available in a localbuffer. The current EIF design introduces MV clipping to pictureboundaries, which requires that an entire picture be available in thelocal buffer. As noted above, techniques and systems are describedherein that improve affine mode coding. Each of the techniques describedherein can be performed individually or in any combination. In someexamples, the systems and techniques described herein restrain (using arestriction or constrain) the reference picture area that can beaccessed from affine sample generation (e.g., by EIF) to a certainlimit, which in some cases can be set as a function of block size. Insome examples, the systems and techniques apply the restriction orconstraint to certain block sizes, e.g. of less than 8×8, or less than4×8, or less than 8×4, or other block sizes. In some cases, therestriction or constraint can be specified as a function of blockdimensions.

The restriction or constraint may be imposed through differentapproaches. One illustrative and non-limiting example of such aconstraint can be implemented as follows, described as a modification toan MPEG EVC description for affine motion constraints. For example, anencoding device and/or decoding device can constrain and/or clip one ormore affine motion vectors or their output (e.g., reference samplecoordinates pointed to by the affine motion vectors), such that theconstraining/clipping ensures that no affine vector of highergranularity (e.g. sub-block or sample) will exceed an allowed area. Twoexamples of forms by which such a constraint can be introduced include abitstream requirements (conformance) and a normative decoding process.One illustrative example of a normative decoding process that may beimplemented through clipping of one or more affine motion vectors (MVs)is as follows (modifying the highlighted portions above by adding thetext marked in underlined text in between “<insert>” and “<insertend>”symbols (e.g., “<insert>added text<insertend>”):

-   -   The motion vector mvX is derived as follows:        mvX[0]=(mvBaseScaled[0]+dX[0]*x+dY[0]*y)  (8-728)        mvX[1]=(mvBaseScaled[1]+dX[1]*x+dY[1]*y)  (8-729)    -   <insert>mvX[0]=Clip3(MinX, MaxX−1, mvX[0])    -   mvX[1]=Clip3(MinY, MaxX−1, mvX[1])<insertend>    -   where clipping parameters are derived as a function of block        size, current block/sample coordinates and MV.

In the above case, spatial coordinates clipping is not required and maybe removed from one or more implementations, which is shown below asstrike-through text in between <delete> and <deleteend> symbols(<delete><deleteend>)with reference to the corresponding sections 8-734and 8-735 shown above:

-   -   <delete>    -   <deleteend>    -   iii. normative decoding process, which may be implemented        through clipping of actual coordinates for data fetching, as        follows:        xInt=Clip3(MinX,MaxX−1,xInt)  (8-734)        yInt=Clip3(MinY,MaxY−1,yInt)  (8-735)    -   where clipping parameters are derived as a function of block        size, current block/sample coordinates and MVs.

In some examples, clipping parameters can be derived by taking intoaccount one or more MVs derived for different spatial positions of anaffine block, e.g. either from an X/Y coordinates pointed by MV v0(top-left CP), or by other MV provided by affine model (e.g. v1 or v2).An example of such an implementation is as follows using a Threshold:{minX,minY,maxX,maxY}=function(Threshold,{v0∥v1∥v2},{x0,y0}){minX,minY,maxX,maxY}=function(Threshold,{v0∥v1∥v2},{x0,y0})xInt=Clip3(MinX,MaxX−1,xInt)yInt=Clip3(MinY,MaxY−1,yInt)

FIG. 16 is a diagram illustrating aspects of an affine model and aspatial neighborhood, in accordance with some examples. FIG. 16illustrates current CTU 1500 as well as the adjacent blocks andsub-blocks and the associated control points and motion vectors abovefrom FIG. 15. While FIG. 16 provides an illustrative example using aCTU, in other examples, the current CTU can be another block, such as aCU, a PU, a TU, etc.

As detailed above, affine coding of the current block 1502 can usereference data. Such reference data can be from a reference 1670illustrated in FIG. 16. In some cases, the reference 1670 can be part ofa picture identified as a reference picture for the current block 1502.Under some circumstances, affine motion vectors can be inconsistent(e.g., can have wide variations with a current block) with indicationsto widely differing portions of the reference picture. For example, asdescribed above, because affine motion can be associated with motion dueto a changing point of view (e.g., movement of a camera position),affine motion vectors can be expected to be fairly consistent across ablock. In some circumstances, however, an affine motion vector for onesample of a block is widely different (e.g., pointing in a differentdirection with a large magnitude) than an affine motion vector foranother sample of the block. When such circumstances occur, the memorybandwidth used to access the reference data (e.g., the reference 1670)indicated by the affine motion vectors can degrade performance.

Examples described herein can include devices (e.g., encoding device 104or decoding device 112) that perform clipping of affine motion vectorsto limit (e.g., to the bounding area 1660) the data in the reference1670 that can possibly be indicated by the affine motion vectors. Insome examples, such clipping can be done with a threshold (e.g., the“Threshold” from above). In some examples, the threshold can be a userand/or a system specified ratio of block sizes utilized as a criteria todefine, the bounding area 1660 (e.g., which can be considered a memoryaccess region which is the area of reference 1670 that is stored in amemory or DCB for use in coding the current block 1502), as shown inFIG. 16. A reference 1670 (e.g., a reference picture or a portion of areference picture, such as a reference block) includes a bounding area1660 (e.g., a portion of reference 1670) that can be pointed to by oneor more affine motion vectors based on limits applied to the affinemotion vectors (e.g., clipping parameters). Arrow 1690 indicates therelationship between samples or points in the current block 1502 and thebounding area 1660, such that the affine motion vectors are limited(e.g., by clipping parameters) to bounding area 1660. Depending on thedifferent affine motion parameters, the relationship between samples ofthe current block 1502 and the data in the bounding area 1660 indicatedby the affine vectors can vary to match the particular affine motionbeing coded in an affine coding mode. Additional details related to therelationship between samples of a current block (e.g., current block1502) and data referenced from a reference picture (e.g., data frombounding area 1660 in reference 1670) are described in detail below withrespect to FIG. 18A, FIG. 18B, and FIG. 18C. In many circumstances, theaffine motion vectors will have consistent values (e.g., due to thenature of affine motion such as a point of view movement as describedabove) across a current block (e.g., the current block 1502), in whichcase performance degradation from limiting the affine motion vectorswill typically be limited.

The limitation of the possible reference data to the bounding area 1660can prevent performance degradation associated with memory bandwidth,and can limit the possible data to be referenced to a manageable sizethat can be buffered in a memory and used for affine coding of thecurrent block 1502. The clipping parameters described herein (e.g., a(cbWidth)×(cbHeight) array; variables such as a horizontal maximumvariable, a horizontal minimum variable, a vertical maximum variable,and a vertical minimum variable; or any other such parameters used tolimit the reference picture data indicated by affine motion vectors fora current block such as current block 1502) can be used in variousexamples to define the bounding area 1660 in the context of the currentblock 1502 and the reference 1670, and can further be used to store thereference data associated with the bounding area 1660 for use in thecoding current block 1502.

FIG. 17 is a diagram illustrating aspects of an affine model and aspatial neighborhood, in accordance with some examples. Similar to FIG.16, FIG. 17 illustrates the current CTU 1500 as well as the adjacentblocks and sub-blocks and the associated control points and motionvectors from FIG. 15. While FIG. 17 provides an illustrative exampleusing a CTU, in other examples, the current CTU can be another block,such as a CU, a PU, a TU, etc. In some examples, as illustrated by FIG.17 and described above, clipping parameters can be derived by takinginto account one or more motion vectors derived for different spatialpositions of an affine block, (e.g. actual affine MV produced for affinesub-blocks), or affine samples within current block. An exampleimplementation can include clipping parameters as:{minX,minY,maxX,maxY}=function(Threshold,{mv(x,y)},{x,y}){minX,minY,maxX,maxY}=function(Threshold,{mv(x,y)},{x,y})

and clipped motion vectors as follows:xInt=Clip3(MinX,MaxX−1,xInt)yInt=Clip3(MinY,MaxY−1,yInt).

Other examples can include other implementations of such motion vectorsand clipping parameters.

As noted, the threshold (e.g., a threshold indicating bounding area 1660used to limit reference data that can be indicated by an affine motionvector) can be a user and/or system specified ratio of block sizesutilized as a criteria to define a memory access region (e.g., boundingarea 1760), as shown in FIG. 17. In FIG. 17, the reference bounding area1760 (e.g., similar to the bounding area 1660 of FIG. 16) specifies abounding area for data of a reference picture that can be pointed to byaffine motion vectors (e.g., given clipping limitations that preventdata from a reference picture outside of the bounding area 1760 frombeing indicated) from samples of the current block 1502. The currentblock reference area 1750 shows an example representative of a blocksize of the current block 1502 pointed to by a motion vector for acentral location (e.g., under the assumption that a translational motionassociated with arrow 1790 is used). The area of the reference pictureaccessible to process current block 1502 (e.g., bounding area 1760) islarger than the current block reference area 1750 due to acceptablevariations in the affine vectors, as described in more detail below. Anaffine motion vector produced for a sample location of the current block1502 or for vectors from relevant sub-blocks, such as sub-blocks 1542,1544, 1552, or 1554 would point to a location within the bounding area1660 in accordance with constraints or clipping parameters implementedas part of affine coding. Additional details of an affine motion vectorand associated reference data subject to clipping or thresholdrestrictions is described below with respect to FIG. 18A, FIG. 18B, andFIG. 18C.

In some examples, scaling and/or clipping CP motion vectors for a CU canbe performed, or resulting motion vector change parameters (dXmv, dYmv)can be used to verify that no affine vector of higher granularity (e.g.sub-block or sample) would exceed the allowed area (e.g., such asbounding area 1660 or 1760). Two examples of forms by which such aconstraint can be introduced include: bitstream requirements (e.g., forconformance), and a normative decoding process. The normative decodingprocess may be implemented either through clipping of CP MVs, orre-adjusting/scaling CP MVs or change parameters to insure constrainimposed on affine motion vectors.

In some examples, MVs of a CP location {v0, v1, v2} can be clippedbefore utilization of them in the affine MV derivation. For example, oneof the CP MVs (e.g., v0) can be taken as a base, and the other CP MVs(e.g., v1 and v2) can be checked to determine if the other CP MVs arepointing outside of the bounding area (which can be referred to aschecking whether a bounding block violation is identified). If such abounding block violation is identified, the identified vector can bescaled proportionally and pointed within the bounding area, one side(corner) of which is specified by the base MV (e.g., v0). Similartechniques can be applied for affine motion models having fewer thanthree CP motion vectors (e.g., fewer than v0, v1, v2) and/or affinemotion models having more than three CP motion vectors (e.g., more thanv0, v1, v2).

In another example, motion information of v0, v1 and v2 may remainun-modified, however, affine parameters dX and dY are to be scaledaccordingly to prevent an affine MV from pointing outside of thebounding block. An example of such an implementation is shown below withtext marked in underlined text in between “<insert2>” and “<insertend2>”symbols (e.g., “<insert2>added text<insertend2>”). Similar techniquescan be applied for affine motion models having fewer than three CPmotion vectors (e.g., fewer than v0, v1, v2) and/or affine motion modelshaving more than three CP motion vectors (e.g., more than v0, v1, v2).

1.1.1.3 Derivation Process for Affine Motion Model Parameters fromControl Point Motion Vectors

Inputs to the process are:

-   -   two variables cbWidth and cbHeight specifying the width and the        height of the luma coding block,    -   the number of control point motion vectors numCpMv,    -   the control point motion vectors cpMvLX[cpIdx], with cpIdx=0 . .        . numCpMv−1 and X being 0 or 1.        Outputs of the process are:    -   horizontal change of motion vector dX,    -   vertical change of motion vector dY,    -   motion vector mvBaseScaled corresponding to the top left corner        of the luma coding block.        The variables log2CbW and log2CbH are derived as follows:        log2CbW=Log2(cbWidth)  (8-688)        log2CbH=Log2(cbHeight)  (8-689)        <insert1>Call clip cpMvLX motion vectors to the bounding block        size of (wBB, hBB, cbWidth, cbHeight, xCb, yCb,        ratio);<insertend1>        Horizontal change of motion vector dX is derived as follows:        dX[0]=(cpMvLX[1][0]−cpMvLX[0][0])<<(7−log2CbW)  (8-690)        dX[1]=(cpMvLX[1][1]−cpMvLX[0][1])<<(7−log2CbW)  (8-691)        Vertical change of motion vector dY is derived as follows:    -   If numCpMv is equal to 3, dY is derived as follow:        dY[0]=(cpMvLX[2][0]−cpMvLX[0][0])<<(7−log2CbH)  (8-692)        dY[1]=(cpMvLX[2][1]−cpMvLX[0][1])<<(7−log2CbH)  (8-693)    -   Otherwise (numCpMv is equal to 2), dY is derived as follows:        dY[0]=−dX[1]  (8-694)        dY[1]=dX[0]  (8-695)        <insert2>Derive scaling parameters scDX and scDY from cpMvLX        motion vectors, bounding area parameters, current block        parameters cbWidth, cbHeight and local coordinates. Scale dX and        dY parameters proportionally, to prevent resulting MV from        pointing outside of the bounding block <insertend2>        dX[0]=scDX*dX[0]  ( )        dX[1]=scDX*dX[1]  ( )        dY[0]=scDY*dY[0]  ( )        dY[1]=scDY*dY[1]  ( )        Motion vector mvBaseScaled corresponding to the top left corner        of the luma coding block is derived as follows:        mvBaseScaled[0]=cpMvLX[0][0]<<7  (8-696)        mvBaseScaled[1]=cpMvLX[0][1]<<7  (8-697)

In some examples, a motion vector and/or spatial coordinates accessiblefor motion compensation are clipped against thresholds instead ofpicture boundaries. Clipping motion vectors or spatial coordinates usingthresholds can be performed to benefit from existing clipping processes(e.g., such as those presented in EVC standards). For instance,parameters of clipping can be computed once per block from tabulatedparameters as follows (with emphasized text marked with underlines inbetween “<highlight>” and “<highlightend>” symbols (e.g.,“<highlight>highlighted text<highlightend>”):

Deviation_A[5]={16, 80, 224, 512, 1088};

Deviation_B[5]={16, 96, 240, 528, 1104};

hor_min=(center_mv_hor−Deviation_A[log2(w)−3])<<5;

ver_min=(center_mv_ver−Deviation_A[log2(h)−3])<<5;

hor_max=(center_mv_hor+Deviation_B[log2(w)−3])<<5;

ver_max=(center_mv_ver+Deviation_B[log2(h)−3])<<5;<highlight>mvX[0]=Clip3(mv_max[0],mv_min[0],mvX[0])  (8-734)mvX[1]=Clip3(mv_max[1],mv_min[1],mvX[1])  (8-735)<highlightend>

As described herein, affine sample generation can be used for videocoding (e.g., video encoding and/or decoding), including standards basedcoding such as EVC, VVC, and/or other existing or to-be-developed codingstandard. Affine coding modes in video coding allow a non-relationalmotion vector for a current block (e.g., the current block 1502) that isbeing coded with predictive processing operations. In some such systemsas described above, there is not a single motion vector for the entirecurrent block (e.g., CU, CTU, PU, TU, or other block). Instead, somesamples within a block have independent affine motion vectors. Eachsample in such a block may have an independent motion vector which canpoint quite far around the reference picture identified for the block.An affine mode coder operating without limits can call or fetch areas offrom a large area of the reference picture, using significant memoryresources for prediction operations in coding (e.g., using referencepicture data that exceeds the capacity of a DPB). In some such systems,an enhanced interpolation filter (EIF) generates an independent motionvector for each sample, and fetching data for such vectors separatelycan be bandwidth intensive and use significant computation resources.The fetched reference data is stored (e.g., buffered) in a memory thatstores samples from areas of the reference picture indicated by themotion vectors. To provide acceptable performance, the referenceabledata that can be fetched to the memory can be restricted by clippingparameters in accordance with examples described herein.

The restriction can be done in various ways, including restrictingcoordinates, restricting the motion vectors used by the affine modeprediction, modifying affine parameters for clipping, clippingconstraints on horizontal and vertical motion vectors with a divisiontable used to clip the vectors that are outside of the defined area, andusing other such restrictions. Some examples include devices andprocesses that restrict the motion vector magnitude to be bounded by acertain boundary around a central location (e.g., bounding area 1660 ofFIG. 16 or the bounding area 1760 of FIG. 17, or the bounding area 1810around center position 1854 of FIG. 18 described below). Some suchexamples can operate by getting a current block, getting control pointsfor affine prediction, generating a synthetic motion vector andapproximating a motion vector for a sample located in the center of theblock. In some such examples, a center location can be used with a DPBto store data from a reference (e.g., data from bounding area 1660 or1760 from a reference picture). In some examples, a minimum-maximum(min-max) deviation of the affine motion vectors that are allowed forthe block can be defined, with any vector pointing outside the limitingregion for the vector clipped to the limiting region.

In some examples, for different block sizes, operations of an affinecoding mode are configured by a coding device to fetch differentreference area sizes. In some examples, the size ratios are associatedby a device configuration with what is computationally feasible (e.g.,without performance degradation) for a particular device or system. Insome examples, for each sample, a coding device is configure by anaffine coding mode to fetch a certain number of reference samples. Insome such examples described herein, thresholds for certain block sizesare indicated as part of the affine coding mode. In other examples,other thresholds can be used. In some examples, affine motion vectorclipping parameters can be derived, as part of affine coding modeoperations, from a central vector in the reference picture and otherinput values. In some such examples with a central sample with a motionvector, the motion vector points to a position in a reference picture.The position in the reference picture in such examples gives a centrallocation of a reference area. The size of the reference area is definedby a deviation value, which in some examples, is fixed by the centrallocation identified by the central motion vector.

In some examples, clipping parameters are deviations that are block sizedependent. For example, Deviation A and Deviation B have specifiedvalues described above as Deviation_A[5]={16, 80, 224, 512, 1088};Deviation_B[5]={16, 96, 240, 528, 1104}. Such values are specified basedon certain size values, such as an image resolution, and will bedifferent for images with different size values (e.g., different imageresolutions).

As described above, in some examples a sample in a current block subjectto affine coding has an affine motion vector pointing to a referencepicture. The motion vector sets the center position from which thereferenceable area (e.g., an area such as the bounding areas 1660 or1760) is defined. The affine motion vector is defined from the standardaffine motion vector generation process as part of affine codingoperations. A control point motion vector is determined and sub-block orsample motion vectors are derived as part of affine coding operationsbased on an affine motion model.

FIG. 18A is a diagram illustrating aspects of clipping using thresholds,in accordance with some examples. As shown in the example of FIG. 18A, ablock 1860 is a current CU with an implementation of EIF affine codingin accordance with examples described herein. A block sized area 1862 ofa reference picture pointed to by a central motion vector 1850 from asample 1852 of the block 1860 defines a reference block sized area 1862of the same size as the block 1860 (the CU). A center position 1854 ofthe boundary area 1810 and the reference block sized area 1862. Theareas 1864, 1866, and 1868 show allowed deviation for the motion vectors1850, 1840, and 1830 corresponding to the samples 1852, 1842, and 1832.In some examples, the area size for the areas 1864, 1866, and 1868 aredefined by a deviation given by ((MV(center)−1)/(MV(center)+1)) integerpixels (e.g., for the size of 8).

With the restriction or deviation defined as described above, in someexamples, the top-left position 1834 associated with sample 1832 canallow a shift associated with a motion vector by a deviation width/2(w/2) and height/2 (h/2), still bounded by centralMV(center)−1/MV(center)+1 (e.g., central motion vector 1830) for area1864. The deviation and bounding illustrated for the area 1864 andposition 1834 (e.g., associated with the central motion vector 1830 andthe sample 1832) introduces an effective bounding block 1810 of memoryaccess for the samples of the current block 1860 when applied to allsamples for the current block 1860. In the example of FIG. 18A, thebounding block 1810 can be considered the reference block sized area1862 (e.g., which is the same size as current block 1860) plus adeviation_on_mv caused by the bounding areas around the extremepositions at the edge of the area 1862, such as the areas 1864 and 1868around the positions 1834 and 1844. By placing a clipping restriction onevery sample of the current block 1860 to restrict the motion vectors toa deviation associated with the size of the areas 1864, 1866, and 1868(e.g., such that these areas are the same size and deviations for allother affine motion vectors for samples from the currant block 1860 willbe the same size), the affine mode as described herein introduces arestriction on every single motion vector within the currant block 1860.In some examples, such restrictions are applied to a motion vector evenif the motion vector is within the bounding block. Such a solutioneffectively provides a “translationalization” on affine motion, which isfurther described below with respect to FIGS. 18B and 18C.

FIG. 18B is a diagram illustrating aspects of clipping using thresholds,in accordance with some examples. FIG. 18B illustrates an example of thecurrent block 1860 with a particular set of affine motion vectors 1836,1847, and 1877 for corresponding samples 1832, 1842, and 1872. Asdescribed above, each sample is associated with an area that defines amaximum deviation for an affine vector associated with a particularsample. The sample 1832 is associated with the area 1864, the sample1842 is associated with the area 1868, and the sample 1872 is associatedwith the area 1876. As described above, if the affine motion vector isoutside of the defined limiting area for that vector (e.g., as indicatedby a center vector for the sample), the affine motion vector is adjustedwith a clipping operation to generate a clipped affine motion vectorthat does not deviate from the associated area for the sample, andtherefore will also not deviate from the bounding block 1810. Since datafor the bounding block 1810 can be stored in a local buffer as describedabove, the coding device can perform the operations for all samples ofthe current block 1860 without fetching additional reference data anddegrading device performance with excessive memory bandwidth use.

In the example of FIG. 18B, the affine motion vector 1836 for the sample1832 is outside of the area 1864 (e.g., with the center position 1834associated with the sample 1832). The affine motion vector 1836 isclipped by affine mode operations to create the clipped affine motionvector 1838 which points within the border of the area 1864 and thebounding block 1810. By contrast, the motion vector 1877 for the sample1872 associated with the center position 1874 and the area 1876 pointsto the position 1875. Since the position 1875 indicated by the affinemotion vector 1877 is within the area 1876, the affine motion vector1877 is not clipped. Similarly, the motion vector 1847 for the sample1842 associated with the center position 1844 and the area 1868 pointsto the position 1845, which is within both the area 1868 and thebounding block 1810, and so the affine motion vector 1847 is notclipped.

As described above for FIG. 18A, the center position 1854 for the sample1852 is used to define the center motion vector 1850. Regardless ofwhether the center motion vector 1850 indicates a large motion or asmall motion, the bounding areas for all other samples for that currentblock 1860, including samples 1832, 1842, 1872, and other samples, haveassociated clipping areas based off of central vectors that can betranslations (e.g. parallel vectors with identical magnitudes butdifferent positions or intersections with the current block). In variousexamples, the change in position between different sample positions willmatch the change in position of the associated areas (e.g., thedifference between positions of samples 1872 and 1832) will be the sameas the difference between positions 1874 and 1834 and the same as thedifference between areas 1876 and 1864. The relationship between therestriction areas and their corresponding samples is the“translationalization” on affine motion referred to above.

FIG. 18C is a diagram illustrating aspects of clipping using thresholds,in accordance with some examples. FIG. 18C illustrates an examplesimilar to FIG. 18B, but with different affine motion vectors. In theexample of FIG. 18C, the affine vector 1836 for the sample 1832 is thesame as in FIG. 18B, however motion vectors 1886 and 1896 for respectivesamples 1872 and 1842 are different. Similar to the motion vector 1836,the motion vector 1896 exceeds the allowable motion vector deviation,and so motion vector 1896 is processed to generate clipped motion vector1898 pointing to position 1897 that is within bounding block 1810 andarea 1868. In the example of FIG. 18C, affine motion vector 1886 isclipped, even though it points within bounding block 1810. Becauseaffine motion vector 1886 exceeds the allowable variation indicated byarea 1876, it is process to generate clipped motion vector 1888 pointingto position 1887 that is within area 1876 associated with centerposition 1874 for sample 1872. In the above example, even though motionvector 1886 points to a position within bounding block 1810, the motionvector 1886 is clipped due to the clipping parameters. Applying suchclipping parameters within bounding area 1810 can simplify clippingoperations and provide efficient use of system resources.

In another example, only the memory area can be defined by a boundingblock (e.g., without restrictions on a motion within a bounding blocksuch as the restriction on motion vectors 1886 associated with area 1876described above). In such an example, a memory bounding block (e.g.,bounding area 1660, 1760, or 1810) can be implemented on final x/ycoordinates, for example with integer precision. Such an example wouldallow unrestricted affine motion vectors within a bounding block. Insuch an example, motion vectors 1836 and 1896 would be clipped, butmotion vector 1886 would not be clipped, as the reference data indicatedby motion vector 1886 is within bounding block 1810, and would be storedin the memory and available for affine coding without an additionalrestriction associated with area 1876. In such an example, additionalcomputing resources may be used to structure the clipping and allowimproved performance at the cost of resources to structure the morecomplex clipping operations, while maintaining the memory bandwidthperformance of other examples (e.g., with a same bounding area 1660,1760, or 1810 limitation on reference data but without individual arearestrictions per motion vector within the bounding blocks such as theareas 1864, 1876, and 1868).

An example of a deviation restriction is as follows (with emphasizedtext marked with underlines in between “<highlight>” and“<highlightend>” symbols (e.g., “<highlight>highlightedtext<highlightend>”):

Deviation_A[5]={16, 80, 224, 512, 1088};

Deviation_B[5]={16, 96, 240, 528, 1104};

MinX=center_pos_x+(center_mv_hor−Deviation_A[log2(w)−3])<<5;

MinY=center_pos_y+(center_mv_ver−Deviation_A[log2(h)−3])<<5;

MaxX=center_pos_x+(center_mv_hor+Deviation_B[log2(w)−3])<<5;

MaxY=center_pos_y+(center_mv_ver+Deviation_B[log2(h)−3])<<5;<highlight>xInt=Clip3(MinX,MaxX−1,xInt)  (8-734)yInt=Clip3(MinY,MaxY−1,yInt)  (8-735)<highlightend>

An example of Specification text providing an example of animplementation for such a solution is shown below with text marked inunderlined text in between “<insert1>” and “<insertend1>” symbols (e.g.,“<insert1>added text<insertend1>”):

1.1.1.4 Interpolation Process for the Enhanced Interpolation Filter

Inputs to the process are:

-   -   a location (xCb, yCb) in full-sample units,    -   two variables cbWidth and cbHeight specifying the width and the        height of the current coding block,    -   horizontal change of motion vector dX,    -   vertical change of motion vector dY,    -   motion vector mvBaseScaled,    -   the selected reference picture sample arrays refPicLX,    -   sample bit depth bitDepth    -   width of the picture in samples pic_width,    -   height of the picture in samples pic_height.        Outputs of the process are:    -   an (cbWidth)×(cbHeight) array predSamplesLX of prediction sample        values.

The variables shift1, shift2, shift3, offset1, offset2 and offset3 arederived as follows:

-   -   shift0 is set equal to bitDepth−6, offset0 is equal to        2^(shift1-1),    -   shift1 is set equal to 11, offset1 is equal to 1024        <insert>The variables hor_max, ver_max, hor_min and ver_min are        derived by invoking the process specified in 0 with a location        (xCb, yCb) in full-sample units, two variables cbWidth and        cbHeight specifying the width and the height of the current        coding block, horizontal change of motion vector dX, vertical        change of motion vector dY, motion vector mvBaseScaled, width of        the picture in samples_pic_width and height of the picture in        samples pic_height as input, and hor_max, ver_max, hor_min and        ver_min as output.<insert>

For x=−1 . . . cbWidth and y=−1 . . . cbHeight, the following applies:

-   -   The motion vector mvX is derived as follows:        mvX[0]=(mvBaseScaled[0]+dX[0]*x+dY[0]*y)  (8-728)        mvX[1]=(mvBaseScaled[1]+dX[1]*x+dY[1]*y)  (8-729)        <insert>mvX[0]=Clip3(hor_min,hor_max,mvX[0])  (8-730)        mvX[1]=Clip3(ver_min,ver_max,mvX[1])  (8-731)<insertend>        <insert> 1.1.1.5 Derivation of clipping parameters for affine        motion vector        Inputs to the process are:    -   a location (xCb, yCb) in full-sample units,    -   two variables cbWidth and cbHeight specifying the width and the        height of the current coding block,    -   horizontal change of motion vector dX,    -   vertical change of motion vector dY,    -   motion vector mvBaseScaled,    -   width of the picture in samples pic_width,    -   height of the picture in samples pic_height.        Outputs of the process are:    -   hor_max, ver_max, hor_min and ver_min that denotes the maximum        and minimum allowed motion vector horizontal and vertical        components.        The center motion vector mv_center is derived as follows:        mv_center[0]=(mvBaseScaled[0]+dX[0]*(cbWidth>>1)+dY[0]*(cbHeight>>1))  (8-743)        mv_center[1]=(mvBaseScaled[1]+dX[1]*(cbWidth>>1)+dY[1]*(cbHeight>>1))  (8-743)        The rounding process for motion vectors as specified in clause        8.5.3.10 is invoked with mv_center, rightShift set equal to 5,        and leftShift set equal to 0 as inputs and the rounded motion        vector is return as mv_center.        The motion vector mv_center is clipped as follows:        mv_center[0]=Clip3(−2¹⁷,2¹⁷−1,mv_center[0])  (8-686)        mv_center[1]=Clip3(−2¹⁷,2¹⁷−1,mv_center[1])  (8-686)        The variable smv_hor_min, mv_ver_min, mv_hor_max and mv_ver_max        are derived as following:        mv_hor_min=mv_center[0]−deviationA[log2CbWidth−3]  (8-743)        mv_ver_min=mv_center[1]−deviationA[log2CbHeight−3]  (8-743)        mv_hor_max=mv_center[0]+deviationB[log2CbWidth−3]  (8-743)        mv_ver_max=mv_center[1]+deviationB[log2CbHeight−3]  (8-743)

with deviationA and deviationB specified for k=0 . . . 4 as:

deviationA[k]={16, 80, 224, 512, 1088},

deviationB[k] {16, 96, 240, 528, 1104}.

The variables hor_max_pic, ver_max_pic, hor_min_pic and ver_min_pic arederived as follows:hor_max_pic=(pic_width+128−xCb−cbWidth+1)<<4  (8-743)ver_max_pic=(pic_height+128−yCb−cbHeight+1)<<4  (8-743)hor_min_pic=(−128−xCb)<<4  (8-743)ver_min_pic=(−128−yCb)<<4  (8-743)The output hor_max, ver_max, hor_min and ver_min that denotes themaximum and minimum allowed motion vector horizontal and verticalcomponents are derived as following:hor_max=min(hor_max_pic,mv_hor_max)<<5  (8-743)ver_max=min(ver_max_pic,mv_ver_max)<<5  (8-743)hor_min=max(hor_min_pic,mv_hor_min)<<5  (8-743)ver_min=max(ver_min_pic,mv_ver_min)<<5  (8-743)<insert>

FIG. 19 is a flowchart illustrating a process 1900 of affine coding withclipping parameters in accordance with examples described herein. Insome examples, the process 1900 can be performed by the encoding device104 or decoding device 112. In some examples, the process 1900 can beembodied as instructions in a computer readable storage medium that,when executed by processing circuitry of a device, causes the device toperform the operations of the process 1900.

At block 1902, the process 1900 includes operations to obtain a currentcoding block from the video data. Such operations can be part ofsequential operations to process multiple coding blocks, with theclipping parameters determined per block, and used for each sample of acurrent block. When one block is coded and the operations move to a nextblock, a new set of clipping parameters can be determined for the newblock, and used for each sample of the new block. In some examples ofprocess 1900 the control data comprises values from a derivation table.

At block 1904, the process 1900 includes operations to determine controldata for the current coding block. In some examples, the control datacan include the described inputs above consisting of location (xCb, yCb)in full-sample units, two variables cbWidth and cbHeight specifying thewidth and the height of the current coding block, a horizontal change ofmotion vector dX, a vertical change of motion vector dY, a motion vectormvBaseScaled, a width of the picture in samples pic_width, and a heightof the picture in samples pic_height. In other examples, othercombinations or groupings of data can be used. In another example, thecontrol data comprises: a location with associated horizontal coordinateand associated vertical coordinate in full-sample units; a widthvariable specifying a width of the current coding block; a heightvariable specifying a height of the current coding block; a horizontalchange of motion vector; a vertical change of motion vector; a basescaled motion vector; a height of a picture associated with the currentcoding block in samples; and a width of the picture in samples.

At block 1906, the process 1900 includes operations to determine one ormore affine motion vector clipping parameters from the control data. Insome examples, the affine motion vector clipping parameters comprise: ahorizontal maximum variable; a horizontal minimum variable; a verticalmaximum variable; and a vertical minimum variable.

In some examples, the horizontal minimum variable is defined by amaximum value selected from a horizontal minimum picture value and ahorizontal minimum motion vector value. In some such examples, thehorizontal minimum variable (hor_min) is defined by a maximum valueselected from a horizontal minimum picture value (hor_min_pic) and ahorizontal minimum motion vector value (mv_hor_min) as:hor_min=max(hor_min_pic, mv_hor_min).

In some such examples, the horizontal minimum picture value(hor_min_pic) is determined from the associated horizontal coordinate.In some such examples, wherein the hor_min_pic is defined as:hor_min_pic=(−128−xCb).

In some examples, the horizontal minimum motion vector value isdetermined from a center motion vector value, an array of values basedon a resolution value associated with the video data or a block areasize (e.g., a current coding block width×height), and the width variablespecifying the width of the current coding block. In some such examples,the mv_hor_min is defined as:mv_hor_min=mv_center[0]−deviationA[log2CbWidth−3]; where mv_center[0] isa center motion vector value, deviationA is an array of values based ona resolution value associated with the video data or a block area size(e.g., a current coding block width×height), and the cbWidth is thewidth variable specifying the width of the current coding block.

In some examples, the center motion vector value is determined from thebase scaled motion vector, the horizontal change of motion vector, thewidth variable, and the height variable. In some such examples, thecenter motion vector value is defined as:

-   -   mv_center[0]=(mvBaseScaled[0]+dX[0]*(cbWidth>>1)+dY[0]*(cbHeight>>1)).

In some examples, the base scaled motion vector corresponds to a topleft corner of the current coding block and is determined from controlpoint motion vector values. In some examples, the mvBaseScaledcorresponds to the top left corner of the luma coding block and isdefined as: mvBaseScaled[0]=cpMvLX[0][0]<<7;mvBaseScaled[1]=cpMvLX[0][1]<<7; where cpMvLX are control point motionvectors.

The above aspects of block 1906 primarily describe operations fordetermining parameters associated with the horizontal minimum variable(hor_min). Each of the other combinations of the horizontal, vertical,maximum, and minimum parameters for vector clipping can have similarexamples as described herein, including elements for a horizontalmaximum variable; a vertical maximum variable; and a vertical minimumvariable.

In some examples, the horizontal maximum variable is defined by aminimum value selected from a horizontal maximum picture value and ahorizontal maximum motion vector value. In some examples, the horizontalmaximum picture value is determined from the width of the picture, theassociated horizontal coordinate, and the width variable. In someexamples, the horizontal maximum motion vector value is determined froma center motion vector value, an array of values based on a resolutionvalue associated with the video data or a block area size (e.g., acurrent coding block width×height), and the width variable specifyingthe width of the current coding block. In some examples, the centermotion vector value is determined from the base scaled motion vector,the horizontal change of motion vector, the width variable, and theheight variable. In some examples, a base scaled motion vectorcorresponds to a corner of the current coding block and is determinedfrom control point motion vector values.

In some examples, the vertical maximum variable is defined by a minimumvalue selected from a vertical maximum picture value and a verticalmaximum motion vector value. In some examples, the vertical maximumpicture value is determined from the height of the picture, theassociated vertical coordinate, and the height variable. In someexamples, the vertical maximum motion vector value is determined from acenter motion vector value, an array of values based on a resolutionvalue associated with the video data or a block area size (e.g., acurrent coding block width×height), and the height variable specifyingthe width of the current coding block.

In some examples, the vertical minimum variable is defined by a maximumvalue selected from a vertical minimum picture value and a verticalminimum motion vector value. In some examples, the vertical minimumpicture value is determined from the associated vertical coordinate. Insome examples, the vertical minimum motion vector value is determinedfrom a center motion vector value, an array of values based on aresolution value associated with the video data or a block area size(e.g., a current coding block width×height), and the height variablespecifying the height of the current coding block.

As part of block 1906 additional particular derivations of theparameters can be performed, including derivations including elementsfor a horizontal maximum variable; a vertical maximum variable; and avertical minimum variable. In some examples, these variables can bedetermined in accordance with details described herein, including:

-   -   hor_max=min(hor_max_pic, mv_hor_max)<<5;    -   ver_max=min(ver_max_pic, mv_ver_max)<<5;    -   hor_min=max(hor_min_pic, mv_hor_min)<<5;    -   ver_min=max(ver_min_pic, mv_ver_min)<<5;    -   mv_center[0]=(mvBaseScaled[0]+dX[0]*(cbWidth>>1)+dY[0]*(cbHeight>>1));    -   mv_center[1]=(mvBaseScaled[1]+dX[1]*(cbWidth>>1)+dY[1]*(cbHeight>>1));    -   mv_center[0]=Clip3(−2¹⁷, 2¹⁷−1, mv_center[0]);    -   mv_center[1]=Clip3(−2¹⁷, 2¹⁷−1, mv_center[1]);    -   mv_hor_min=mv_center[0]−deviationA[log2CbWidth−3];    -   mv_ver_min=mv_center[1]−deviationA[log2CbHeight−3];    -   mv_hor_max=mv_center[0]+deviationB[log2CbWidth−3];    -   mv_ver_max=mv_center[1]+deviationB[log2CbHeight−3];    -   with deviationA and deviationB specified for k=0 . . . 4 as        deviationA[k]={16, 80, 224, 512, 1088} and deviationB[k] {16,        96, 240, 528, 1104};    -   hor_max_pic=(pic_width+128−xCb−cbWidth+1)<<4;    -   ver_max_pic=(pic_height+128−yCb−cbHeight+1)<<4;    -   hor_min_pic=(−128−xCb)<<4;    -   ver_min_pic=(−128−yCb)<<4;        and other such details described herein. In other examples,        other similar processes for determining parameters for clipping        can be used.

At block 1908, the process 1900 includes operations to select a sampleof the current coding block. As described above, a select number ofsamples for a current block can be used, or each sample of a currentblock can be used. EVC based example affine prediction can beimplemented with different approaches. One example EVC approach utilizestranslation motion prediction for sub-blocks. Another example of EVCaffine prediction uses finer granularity y (e.g., pixelwise) motionprediction. Different approaches can have associated operations toselect samples.

At block 1910, the process 1900 includes operations to determine anaffine motion vector for the sample of the current coding block. In someexamples, the affine motion vector for the sample of the current blockis determined according to a first base scaled motion vector value, afirst horizontal change of motion vector value, a first vertical changeof motion vector value, a second base scaled motion vector value, asecond horizontal change of motion vector value, a second verticalchange of motion vector value, a horizontal coordinate of the sample,and a vertical coordinate of the sample. In some examples, the motionvector specified as mvX can be derived using:mvX[0]=(mvBaseScaled[0]+dX[0]*x+dY[0]*y);mvX[1]=(mvBaseScaled[1]+dX[1]*x+dY[1]*y.

At block 1912, the process 1900 includes operations to clip the affinemotion vector using the one or more affine motion vector clippingparameters to generate a clipped affine motion vector. In some examples,the affine motion vector is clipped according to mvX[0]=Clip3(hor_min,hor_max, mvX[0]); and mvX[1]=Clip3(ver_min, ver_max, mvX[1]).

In addition to the blocks above, some elements of process 1900 caninclude additional operations, intervening operations, or repetitions ofoperations of certain blocks. In some examples, such additionaloperations can include operations to identify a reference pictureassociated with the current coding block; and store a portion of thereference picture defined by the affine motion vector clippingparameters. Some such operations can function where the portion of thereference picture is stored in the memory buffer for affine motionprocessing operations using the current coding block.

Similarly, some repeated operations can include operations tosequentially obtain a plurality of current coding blocks from the videodata; determine a set of affine motion vector clipping parameters on aper coding block basis for blocks of the plurality of current codingblocks; and fetch portions of a corresponding reference pictures usingthe set of affine motion vector clipping parameters on the per blockbasis for the plurality of current coding blocks. In any such examples,the operations can further include processing the current block usingreference picture data from a reference picture indicated by the clippedaffine motion vector. Such a block can be a luma coding block, or anyother such block for video data being coded in an affine coding mode.Such a process 1900 can be performed by any device herein, including adevice with a memory and one or more processors. Such devices caninclude a device with a display device a display device coupled to theone or more processors and configured to display images from the videodata; and one or more wireless interfaces coupled to the one or moreprocessors, the one or more wireless interfaces comprising one or morebaseband processors and one or more transceivers. Other such devices caninclude other components described herein.

In some examples, the processes described herein may be performed by acomputing device or an apparatus, such as the encoding device 104, thedecoding device 112, and/or any other computing device. In some cases,the computing device or apparatus may include a processor,microprocessor, microcomputer, or other component of a device that isconfigured to carry out the steps of processes described herein. In someexamples, the computing device or apparatus may include a cameraconfigured to capture video data (e.g., a video sequence) includingvideo frames. For example, the computing device may include a cameradevice, which may or may not include a video codec. As another example,the computing device may include a mobile device with a camera (e.g., acamera device such as a digital camera, an IP camera or the like, amobile phone or tablet including a camera, or other type of device witha camera). In some cases, the computing device may include a display fordisplaying images. In some examples, a camera or other capture devicethat captures the video data is separate from the computing device, inwhich case the computing device receives the captured video data. Thecomputing device may further include a network interface, transceiver,and/or transmitter configured to communicate the video data. The networkinterface, transceiver, and/or transmitter may be configured tocommunicate Internet Protocol (IP) based data or other network data.

The processes described herein can be implemented in hardware, computerinstructions, or a combination thereof. In the context of computerinstructions, the operations represent computer-executable instructionsstored on one or more computer-readable storage media that, whenexecuted by one or more processors, perform the recited operations.Generally, computer-executable instructions include routines, programs,objects, components, data structures, and the like that performparticular functions or implement particular data types. The order inwhich the operations are described is not intended to be construed as alimitation, and any number of the described operations can be combinedin any order and/or in parallel to implement the processes.

Additionally, the processes described herein may be performed under thecontrol of one or more computer systems configured with executableinstructions and may be implemented as code (e.g., executableinstructions, one or more computer programs, or one or moreapplications) executing collectively on one or more processors, byhardware, or combinations thereof. As noted above, the code may bestored on a computer-readable or machine-readable storage medium, forexample, in the form of a computer program comprising a plurality ofinstructions executable by one or more processors. The computer-readableor machine-readable storage medium may be non-transitory.

The coding techniques discussed herein may be implemented in an examplevideo encoding and decoding system (e.g., the system 100). In someexamples, a system includes a source device that provides encoded videodata to be decoded at a later time by a destination device. Inparticular, the source device provides the video data to destinationdevice via a computer-readable medium. The source device and thedestination device may comprise any of a wide range of devices,including desktop computers, notebook (i.e., laptop) computers, tabletcomputers, set-top boxes, telephone handsets such as so-called “smart”phones, so-called “smart” pads, televisions, cameras, display devices,digital media players, video gaming consoles, video streaming device, orthe like. In some cases, the source device and the destination devicemay be equipped for wireless communication.

The destination device may receive the encoded video data to be decodedvia the computer-readable medium. The computer-readable medium maycomprise any type of medium or device capable of moving the encodedvideo data from source device to destination device. In one example,computer-readable medium may comprise a communication medium to enablesource device to transmit encoded video data directly to destinationdevice in real-time. The encoded video data may be modulated accordingto a communication standard, such as a wireless communication protocol,and transmitted to destination device. The communication medium maycomprise any wireless or wired communication medium, such as a radiofrequency (RF) spectrum or one or more physical transmission lines. Thecommunication medium may form part of a packet-based network, such as alocal area network, a wide-area network, or a global network such as theInternet. The communication medium may include routers, switches, basestations, or any other equipment that may be useful to facilitatecommunication from source device to destination device.

In some examples, encoded data may be output from output interface to astorage device. Similarly, encoded data may be accessed from the storagedevice by input interface. The storage device may include any of avariety of distributed or locally accessed data storage media such as ahard drive, Blu-ray discs, DVDs, CD-ROMs, flash memory, volatile ornon-volatile memory, or any other suitable digital storage media forstoring encoded video data. In a further example, the storage device maycorrespond to a file server or another intermediate storage device thatmay store the encoded video generated by source device. Destinationdevice may access stored video data from the storage device viastreaming or download. The file server may be any type of server capableof storing encoded video data and transmitting that encoded video datato the destination device. Example file servers include a web server(e.g., for a website), an FTP server, network attached storage (NAS)devices, or a local disk drive. Destination device may access theencoded video data through any standard data connection, including anInternet connection. The connection may include a wireless channel(e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem,etc.), or a combination of both that is suitable for accessing encodedvideo data stored on a file server. The transmission of encoded videodata from the storage device may be a streaming transmission, a downloadtransmission, or a combination thereof.

The techniques of the disclosure are not necessarily limited to wirelessapplications or settings. The techniques may be applied to video codingin support of any of a variety of multimedia applications, such asover-the-air television broadcasts, cable television transmissions,satellite television transmissions, Internet streaming videotransmissions, such as dynamic adaptive streaming over HTTP (DASH),digital video that is encoded onto a data storage medium, decoding ofdigital video stored on a data storage medium, or other applications. Insome examples, system may be configured to support one-way or two-wayvideo transmission to support applications such as video streaming,video playback, video broadcasting, and/or video telephony.

In one example the source device includes a video source, a videoencoder, and a output interface. The destination device may include aninput interface, a video decoder, and a display device. The videoencoder of source device may be configured to apply the techniquesdisclosed herein. In other examples, a source device and a destinationdevice may include other components or arrangements. For example, thesource device may receive video data from an external video source, suchas an external camera. Likewise, the destination device may interfacewith an external display device, rather than including an integrateddisplay device.

The example system above is merely one example. Techniques forprocessing video data in parallel may be performed by any digital videoencoding and/or decoding device. Although generally the techniques ofthe disclosure are performed by a video encoding device, the techniquesmay also be performed by a video encoder/decoder, typically referred toas a “CODEC.” Moreover, the techniques of the disclosure may also beperformed by a video preprocessor. Source device and destination deviceare merely examples of such coding devices in which source devicegenerates coded video data for transmission to destination device. Insome examples, the source and destination devices may operate in asubstantially symmetrical manner such that each of the devices includesvideo encoding and decoding components. Hence, example systems maysupport one-way or two-way video transmission between video devices,e.g., for video streaming, video playback, video broadcasting, or videotelephony.

The video source may include a video capture device, such as a videocamera, a video archive containing previously captured video, and/or avideo feed interface to receive video from a video content provider. Asa further alternative, the video source may generate computergraphics-based data as the source video, or a combination of live video,archived video, and computer generated video. In some cases, if videosource is a video camera, source device and destination device may formso-called camera phones or video phones. As mentioned above, however,the techniques described in the disclosure may be applicable to videocoding in general, and may be applied to wireless and/or wiredapplications. In each case, the captured, pre-captured, orcomputer-generated video may be encoded by the video encoder. Theencoded video information may be output by an output interface onto thecomputer-readable medium.

As noted the computer-readable medium may include transient media, suchas a wireless broadcast or wired network transmission, or storage media(that is, non-transitory storage media), such as a hard disk, flashdrive, compact disc, digital video disc, Blu-ray disc, or othercomputer-readable media. In some examples, a network server (not shown)may receive encoded video data from the source device and provide theencoded video data to the destination device, e.g., via networktransmission. Similarly, a computing device of a medium productionfacility, such as a disc stamping facility, may receive encoded videodata from the source device and produce a disc containing the encodedvideo data. Therefore, the computer-readable medium may be understood toinclude one or more computer-readable media of various forms, in variousexamples.

The input interface of the destination device receives information fromthe computer-readable medium. The information of the computer-readablemedium may include syntax information defined by the video encoder,which is also used by the video decoder, that includes syntax elementsthat describe characteristics and/or processing of blocks and othercoded units, e.g., group of pictures (GOP). A display device displaysthe decoded video data to a user, and may comprise any of a variety ofdisplay devices such as a cathode ray tube (CRT), a liquid crystaldisplay (LCD), a plasma display, an organic light emitting diode (OLED)display, or another type of display device. Various embodiments of theapplication have been described.

Specific details of the encoding device 104 and the decoding device 112are shown in FIG. 20 and FIG. 21 respectively. FIG. 20 is a blockdiagram illustrating an example encoding device 104 that may implementone or more of the techniques described in the disclosure. Encodingdevice 104 may, for example, generate the syntax structures describedherein (e.g., the syntax structures of a VPS, SPS, PPS, or other syntaxelements). Encoding device 104 may perform intra-prediction andinter-prediction coding of video blocks within video slices. Aspreviously described, intra-coding relies, at least in part, on spatialprediction to reduce or remove spatial redundancy within a given videoframe or picture. Inter-coding relies, at least in part, on temporalprediction to reduce or remove temporal redundancy within adjacent orsurrounding frames of a video sequence. Intra-mode (I mode) may refer toany of several spatial based compression modes. Inter-modes, such asuni-directional prediction (P mode) or bi-prediction (B mode), may referto any of several temporal-based compression modes.

The encoding device 104 includes a partitioning unit 35, predictionprocessing unit 41, filter unit 63, picture memory 64, summer 50,transform processing unit 52, quantization unit 54, and entropy encodingunit 56. Prediction processing unit 41 includes motion estimation unit42, motion compensation unit 44, and intra-prediction processing unit46. For video block reconstruction, encoding device 104 also includesinverse quantization unit 58, inverse transform processing unit 60, andsummer 62. Filter unit 63 is intended to represent one or more loopfilters such as a deblocking filter, an adaptive loop filter (ALF), anda sample adaptive offset (SAO) filter. Although filter unit 63 is shownin FIG. 20 as being an in loop filter, in other configurations, filterunit 63 may be implemented as a post loop filter. A post processingdevice 57 may perform additional processing on encoded video datagenerated by the encoding device 104. The techniques of the disclosuremay in some instances be implemented by the encoding device 104. Inother instances, however, one or more of the techniques of thedisclosure may be implemented by post processing device 57.

As shown in FIG. 20, the encoding device 104 receives video data, andpartitioning unit 35 partitions the data into video blocks. Thepartitioning may also include partitioning into slices, slice segments,tiles, or other larger units, as wells as video block partitioning,e.g., according to a quadtree structure of LCUs and CUs. The encodingdevice 104 generally illustrates the components that encode video blockswithin a video slice to be encoded. The slice may be divided intomultiple video blocks (and possibly into sets of video blocks referredto as tiles). Prediction processing unit 41 may select one of aplurality of possible coding modes, such as one of a plurality ofintra-prediction coding modes or one of a plurality of inter-predictioncoding modes, for the current video block based on error results (e.g.,coding rate and the level of distortion, or the like). Predictionprocessing unit 41 may provide the resulting intra- or inter-coded blockto summer 50 to generate residual block data and to summer 62 toreconstruct the encoded block for use as a reference picture.

Intra-prediction processing unit 46 within prediction processing unit 41may perform intra-prediction coding of the current video block relativeto one or more neighboring blocks in the same frame or slice as thecurrent block to be coded to provide spatial compression. Motionestimation unit 42 and motion compensation unit 44 within predictionprocessing unit 41 perform inter-predictive coding of the current videoblock relative to one or more predictive blocks in one or more referencepictures to provide temporal compression.

Motion estimation unit 42 may be configured to determine theinter-prediction mode for a video slice according to a predeterminedpattern for a video sequence. The predetermined pattern may designatevideo slices in the sequence as P slices, B slices, or GPB slices.Motion estimation unit 42 and motion compensation unit 44 may be highlyintegrated, but are illustrated separately for conceptual purposes.Motion estimation, performed by motion estimation unit 42, is theprocess of generating motion vectors, which estimate motion for videoblocks. A motion vector, for example, may indicate the displacement of aprediction unit (PU) of a video block within a current video frame orpicture relative to a predictive block within a reference picture.

A predictive block is a block that is found to closely match the PU ofthe video block to be coded in terms of pixel difference, which may bedetermined by sum of absolute difference (SAD), sum of square difference(SSD), or other difference metrics. In some examples, the encodingdevice 104 may calculate values for sub-integer pixel positions ofreference pictures stored in picture memory 64. For example, theencoding device 104 may interpolate values of one-quarter pixelpositions, one-eighth pixel positions, or other fractional pixelpositions of the reference picture. Therefore, motion estimation unit 42may perform a motion search relative to the full pixel positions andfractional pixel positions and output a motion vector with fractionalpixel precision.

Motion estimation unit 42 calculates a motion vector for a PU of a videoblock in an inter-coded slice by comparing the position of the PU to theposition of a predictive block of a reference picture. The referencepicture may be selected from a first reference picture list (List 0) ora second reference picture list (List 1), each of which identify one ormore reference pictures stored in picture memory 64. Motion estimationunit 42 sends the calculated motion vector to entropy encoding unit 56and motion compensation unit 44.

Motion compensation, performed by motion compensation unit 44, mayinvolve fetching or generating the predictive block based on the motionvector determined by motion estimation, possibly performinginterpolations to sub-pixel precision. Upon receiving the motion vectorfor the PU of the current video block, motion compensation unit 44 maylocate the predictive block to which the motion vector points in areference picture list. The encoding device 104 forms a residual videoblock by subtracting pixel values of the predictive block from the pixelvalues of the current video block being coded, forming pixel differencevalues. The pixel difference values form residual data for the block,and may include both luma and chroma difference components. Summer 50represents the component or components that perform the subtractionoperation. Motion compensation unit 44 may also generate syntax elementsassociated with the video blocks and the video slice for use by thedecoding device 112 in decoding the video blocks of the video slice.

Intra-prediction processing unit 46 may intra-predict a current block,as an alternative to the inter-prediction performed by motion estimationunit 42 and motion compensation unit 44, as described above. Inparticular, intra-prediction processing unit 46 may determine anintra-prediction mode to use to encode a current block. In someexamples, intra-prediction processing unit 46 may encode a current blockusing various intra-prediction modes, e.g., during separate encodingpasses, and intra-prediction processing unit 46 may select anappropriate intra-prediction mode to use from the tested modes. Forexample, intra-prediction processing unit 46 may calculaterate-distortion values using a rate-distortion analysis for the varioustested intra-prediction modes, and may select the intra-prediction modehaving the best rate-distortion characteristics among the tested modes.Rate-distortion analysis generally determines an amount of distortion(or error) between an encoded block and an original, unencoded blockthat was encoded to produce the encoded block, as well as a bit rate(that is, a number of bits) used to produce the encoded block.Intra-prediction processing unit 46 may calculate ratios from thedistortions and rates for the various encoded blocks to determine whichintra-prediction mode exhibits the best rate-distortion value for theblock.

In any case, after selecting an intra-prediction mode for a block,intra-prediction processing unit 46 may provide information indicativeof the selected intra-prediction mode for the block to entropy encodingunit 56. Entropy encoding unit 56 may encode the information indicatingthe selected intra-prediction mode. The encoding device 104 may includein the transmitted bitstream configuration data definitions of encodingcontexts for various blocks as well as indications of a most probableintra-prediction mode, an intra-prediction mode index table, and amodified intra-prediction mode index table to use for each of thecontexts. The bitstream configuration data may include a plurality ofintra-prediction mode index tables and a plurality of modifiedintra-prediction mode index tables (also referred to as codeword mappingtables). After prediction processing unit 41 generates the predictiveblock for the current video block via either inter-prediction orintra-prediction, the encoding device 104 forms a residual video blockby subtracting the predictive block from the current video block. Theresidual video data in the residual block may be included in one or moreTUs and applied to transform processing unit 52. Transform processingunit 52 transforms the residual video data into residual transformcoefficients using a transform, such as a discrete cosine transform(DCT) or a conceptually similar transform. Transform processing unit 52may convert the residual video data from a pixel domain to a transformdomain, such as a frequency domain.

Transform processing unit 52 may send the resulting transformcoefficients to quantization unit 54. Quantization unit 54 quantizes thetransform coefficients to further reduce bit rate. The quantizationprocess may reduce the bit depth associated with some or all of thecoefficients. The degree of quantization may be modified by adjusting aquantization parameter. In some examples, quantization unit 54 mayperform a scan of the matrix including the quantized transformcoefficients. Alternatively, entropy encoding unit 56 may perform thescan.

Following quantization, entropy encoding unit 56 entropy encodes thequantized transform coefficients. For example, entropy encoding unit 56may perform context adaptive variable length coding (CAVLC), contextadaptive binary arithmetic coding (CABAC), syntax-based context-adaptivebinary arithmetic coding (SBAC), probability interval partitioningentropy (PIPE) coding or another entropy encoding technique. Followingthe entropy encoding by entropy encoding unit 56, the encoded bitstreammay be transmitted to the decoding device 112, or archived for latertransmission or retrieval by the decoding device 112. Entropy encodingunit 56 may also entropy encode the motion vectors and the other syntaxelements for the current video slice being coded.

Inverse quantization unit 58 and inverse transform processing unit 60apply inverse quantization and inverse transformation, respectively, toreconstruct the residual block in the pixel domain for later use as areference block of a reference picture. Motion compensation unit 44 maycalculate a reference block by adding the residual block to a predictiveblock of one of the reference pictures within a reference picture list.Motion compensation unit 44 may also apply one or more interpolationfilters to the reconstructed residual block to calculate sub-integerpixel values for use in motion estimation. Summer 62 adds thereconstructed residual block to the motion compensated prediction blockproduced by motion compensation unit 44 to produce a reference block forstorage in picture memory 64. The reference block may be used by motionestimation unit 42 and motion compensation unit 44 as a reference blockto inter-predict a block in a subsequent video frame or picture.

The encoding device 104 may perform any of the techniques describedherein. Some techniques of the disclosure have generally been describedwith respect to the encoding device 104, but as mentioned above, some ofthe techniques of the disclosure may also be implemented by postprocessing device 57.

The encoding device 104 of FIG. 20 represents an example of a videoencoder configured to perform one or more of the transform codingtechniques described herein. The encoding device 104 may perform any ofthe techniques described herein, including the processes described abovewith respect to FIG. 21.

FIG. 21 is a block diagram illustrating an example decoding device 112.The decoding device 112 includes an entropy decoding unit 80, predictionprocessing unit 81, inverse quantization unit 86, inverse transformprocessing unit 88, summer 90, filter unit 91, and picture memory 92.Prediction processing unit 81 includes motion compensation unit 82 andintra prediction processing unit 84. The decoding device 112 may, insome examples, perform a decoding pass generally reciprocal to theencoding pass described with respect to the encoding device 104 fromFIG. 20.

During the decoding process, the decoding device 112 receives an encodedvideo bitstream that represents video blocks of an encoded video sliceand associated syntax elements sent by the encoding device 104. In someembodiments, the decoding device 112 may receive the encoded videobitstream from the encoding device 104. In some embodiments, thedecoding device 112 may receive the encoded video bitstream from anetwork entity 79, such as a server, a media-aware network element(MANE), a video editor/splicer, or other such device configured toimplement one or more of the techniques described above. Network entity79 may or may not include the encoding device 104. Some of thetechniques described in the disclosure may be implemented by networkentity 79 prior to network entity 79 transmitting the encoded videobitstream to the decoding device 112. In some video decoding systems,network entity 79 and the decoding device 112 may be parts of separatedevices, while in other instances, the functionality described withrespect to network entity 79 may be performed by the same device thatcomprises the decoding device 112.

The entropy decoding unit 80 of the decoding device 112 entropy decodesthe bitstream to generate quantized coefficients, motion vectors, andother syntax elements. Entropy decoding unit 80 forwards the motionvectors and other syntax elements to prediction processing unit 81. Thedecoding device 112 may receive the syntax elements at the video slicelevel and/or the video block level. Entropy decoding unit 80 may processand parse both fixed-length syntax elements and variable-length syntaxelements in or more parameter sets, such as a VPS, SPS, and PPS.

When the video slice is coded as an intra-coded (I) slice, intraprediction processing unit 84 of prediction processing unit 81 maygenerate prediction data for a video block of the current video slicebased on a signaled intra-prediction mode and data from previouslydecoded blocks of the current frame or picture. When the video frame iscoded as an inter-coded (i.e., B, P or GPB) slice, motion compensationunit 82 of prediction processing unit 81 produces predictive blocks fora video block of the current video slice based on the motion vectors andother syntax elements received from entropy decoding unit 80. Thepredictive blocks may be produced from one of the reference pictureswithin a reference picture list. The decoding device 112 may constructthe reference frame lists, List 0 and List 1, using default constructiontechniques based on reference pictures stored in picture memory 92.

Motion compensation unit 82 determines prediction information for avideo block of the current video slice by parsing the motion vectors andother syntax elements, and uses the prediction information to producethe predictive blocks for the current video block being decoded. Forexample, motion compensation unit 82 may use one or more syntax elementsin a parameter set to determine a prediction mode (e.g., intra- orinter-prediction) used to code the video blocks of the video slice, aninter-prediction slice type (e.g., B slice, P slice, or GPB slice),construction information for one or more reference picture lists for theslice, motion vectors for each inter-encoded video block of the slice,inter-prediction status for each inter-coded video block of the slice,and other information to decode the video blocks in the current videoslice.

Motion compensation unit 82 may also perform interpolation based oninterpolation filters. Motion compensation unit 82 may use interpolationfilters as used by the encoding device 104 during encoding of the videoblocks to calculate interpolated values for sub-integer pixels ofreference blocks. In the above case, motion compensation unit 82 maydetermine the interpolation filters used by the encoding device 104 fromthe received syntax elements, and may use the interpolation filters toproduce predictive blocks.

Inverse quantization unit 86 inverse quantizes, or de-quantizes, thequantized transform coefficients provided in the bitstream and decodedby entropy decoding unit 80. The inverse quantization process mayinclude use of a quantization parameter calculated by the encodingdevice 104 for each video block in the video slice to determine a degreeof quantization and, likewise, a degree of inverse quantization thatshould be applied. Inverse transform processing unit 88 applies aninverse transform (e.g., an inverse DCT or other suitable inversetransform), an inverse integer transform, or a conceptually similarinverse transform process, to the transform coefficients in order toproduce residual blocks in the pixel domain.

After motion compensation unit 82 generates the predictive block for thecurrent video block based on the motion vectors and other syntaxelements, the decoding device 112 forms a decoded video block by summingthe residual blocks from inverse transform processing unit 88 with thecorresponding predictive blocks generated by motion compensation unit82. Summer 90 represents the component or components that perform thesummation operation. If desired, loop filters (either in the coding loopor after the coding loop) may also be used to smooth pixel transitions,or to otherwise improve the video quality. Filter unit 91 is intended torepresent one or more loop filters such as a deblocking filter, anadaptive loop filter (ALF), and a sample adaptive offset (SAO) filter.Although filter unit 91 is shown in FIG. 21 as being an in loop filter,in other configurations, filter unit 91 may be implemented as a postloop filter. The decoded video blocks in a given frame or picture arestored in picture memory 92, which stores reference pictures used forsubsequent motion compensation. Picture memory 92 also stores decodedvideo for later presentation on a display device, such as videodestination device 122 shown in FIG. 1.

The decoding device 112 of FIG. 21 represents an example of a videodecoder configured to perform one or more of the transform codingtechniques described herein. The decoding device 112 may perform any ofthe techniques described herein, including the process 1900 describedabove with respect to FIG. 21.

In the foregoing description, aspects of the application are describedwith reference to specific embodiments thereof, but those skilled in theart will recognize that the subject matter of the application is notlimited thereto. Thus, while illustrative embodiments of the applicationhave been described in detail herein, it is to be understood that theinventive concepts may be otherwise variously embodied and employed, andthat the appended claims are intended to be construed to include suchvariations, except as limited by the prior art. Various features andaspects of the above-described subject matter may be used individuallyor jointly. Further, embodiments can be utilized in any number ofenvironments and applications beyond those described herein withoutdeparting from the broader spirit and scope of the specification. Thespecification and drawings are, accordingly, to be regarded asillustrative rather than restrictive. For the purposes of illustration,methods were described in a particular order. It should be appreciatedthat in alternate embodiments, the methods may be performed in adifferent order than that described.

One of ordinary skill will appreciate that the less than (“<”) andgreater than (“>”) symbols or terminology used herein can be replacedwith less than or equal to (“≤”) and greater than or equal to (“≥”)symbols, respectively, without departing from the scope of thedescription.

Where components are described as being “configured to” perform certainoperations, such configuration can be accomplished, for example, bydesigning electronic circuits or other hardware to perform theoperation, by programming programmable electronic circuits (e.g.,microprocessors, or other suitable electronic circuits) to perform theoperation, or any combination thereof.

Claim language or other language reciting “at least one of” a set and/or“one or more” of a set indicates that one member of the set or multiplemembers of the set (in any combination) satisfy the claim. For example,claim language reciting “at least one of A and B” means A, B, or A andB. In another example, claim language reciting “at least one of A, B,and C” means A, B, C, or A and B, or A and C, or B and C, or A and B andC. The language “at least one of” a set and/or “one or more” of a setdoes not limit the set to the items listed in the set. For example,claim language reciting “at least one of A and B” can mean A, B, or Aand B, and can additionally include items not listed in the set of A andB.

The various illustrative logical blocks, modules, circuits, andalgorithm steps described in connection with the embodiments disclosedherein may be implemented as electronic hardware, computer software,firmware, or combinations thereof. To clearly illustrate theinterchangeability of hardware and software, various illustrativecomponents, blocks, modules, circuits, and steps have been describedabove generally in terms of their functionality. Whether suchfunctionality is implemented as hardware or software depends upon theparticular application and design constraints imposed on the overallsystem. Skilled artisans may implement the described functionality invarying ways for each particular application, but such implementationdecisions should not be interpreted as causing a departure from thescope of the present application.

The techniques described herein may also be implemented in electronichardware, computer software, firmware, or any combination thereof. Suchtechniques may be implemented in any of a variety of devices such asgeneral purposes computers, wireless communication device handsets, orintegrated circuit devices having multiple uses including application inwireless communication device handsets and other devices. Any featuresdescribed as modules or components may be implemented together in anintegrated logic device or separately as discrete but interoperablelogic devices. If implemented in software, the techniques may berealized at least in part by a computer-readable data storage mediumcomprising program code including instructions that, when executed,performs one or more of the methods described above. Thecomputer-readable data storage medium may form part of a computerprogram product, which may include packaging materials. Thecomputer-readable medium may comprise memory or data storage media, suchas random access memory (RAM) such as synchronous dynamic random accessmemory (SDRAM), read-only memory (ROM), non-volatile random accessmemory (NVRAM), electrically erasable programmable read-only memory(EEPROM), FLASH memory, magnetic or optical data storage media, and thelike. The techniques additionally, or alternatively, may be realized atleast in part by a computer-readable communication medium that carriesor communicates program code in the form of instructions or datastructures and that can be accessed, read, and/or executed by acomputer, such as propagated signals or waves.

The program code may be executed by a processor, which may include oneor more processors, such as one or more digital signal processors(DSPs), general purpose microprocessors, an application specificintegrated circuits (ASICs), field programmable logic arrays (FPGAs), orother equivalent integrated or discrete logic circuitry. Such aprocessor may be configured to perform any of the techniques describedin the disclosure. A general purpose processor may be a microprocessor;but in the alternative, the processor may be any conventional processor,controller, microcontroller, or state machine. A processor may also beimplemented as a combination of computing devices, e.g., a combinationof a DSP and a microprocessor, a plurality of microprocessors, one ormore microprocessors in conjunction with a DSP core, or any other suchconfiguration. Accordingly, the term “processor,” as used herein mayrefer to any of the foregoing structure, any combination of theforegoing structure, or any other structure or apparatus suitable forimplementation of the techniques described herein. In addition, in someaspects, the functionality described herein may be provided withindedicated software modules or hardware modules configured for encodingand decoding, or incorporated in a combined video encoder-decoder(CODEC).

Illustrative examples of the disclosure include:

Example 1. A method of processing video data, the method comprising:obtaining one or more blocks of video data; and determining an affinemotion vector to use for predicting a block of the video data, whereinan area of at least reference picture accessible using the affine motionvector is restricted based on a constraint.

Example 2. The method of example 1, wherein the constraint is based on asize of the block.

Example 3. The method of any one of examples 1 to 2, further comprising:clipping the affine motion vector according to the constraint.

Example 4. The method of any one of examples 1 to 2, further comprising:clipping, according to the constraint, reference sample coordinates ofat least one sample from the at least one reference picture, thereference sample coordinates being determined using the affine motionvector.

Example 5. The method of any one of examples 1 to 4, further comprising:deriving clipping parameters as a function of a size of the block.

Example 6. The method of any one of examples 1 to 5, further comprising:computing parameters for clipping at least one of an affine motionvector an reference sample coordinates once per block from one or moretabulated parameters.

Example 7. An apparatus comprising a memory configured to store videodata and a processor configured to process the video data according toany of examples 1 to 6.

Example 8. The apparatus of example 7, wherein the apparatus includes adecoder.

Example 9. The apparatus of example 7, wherein the apparatus includes anencoder.

Example 10. The apparatus of any one of examples 7 to 9, wherein theapparatus is a mobile device.

Example 11. The apparatus of any one of examples 7 to 10, furthercomprising a display configured to display the video data.

Example 12. The apparatus of any one of examples 7 to 11, furthercomprising a camera configured to capture one or more pictures.

Example 13. A computer readable medium having stored thereoninstructions that when executed by a processor perform the methods ofany of examples 1 to 6.

Example 14. An apparatus for coding video data, the apparatuscomprising: memory; and one or more processors coupled to the memory,the one or more processors being configured to: obtain a current codingblock from the video data; determine control data for the current codingblock; determine one or more affine motion vector clipping parametersfrom the control data; select a sample of the current coding block;determine an affine motion vector for the sample of the current codingblock; and clip the affine motion vector using the one or more affinemotion vector clipping parameters to generate a clipped affine motionvector.

Example 15. The apparatus of example 14, wherein the control datacomprises: a location with associated horizontal coordinate andassociated vertical coordinate in full-sample units; a width variablespecifying a width of the current coding block; a height variablespecifying a height of the current coding block; a horizontal change ofmotion vector; a vertical change of motion vector; a base scaled motionvector; a height of a picture associated with the current coding blockin samples; and a width of the picture in samples.

Example 16. The apparatus of example 15, wherein the one or more affinemotion vector clipping parameters comprise: a horizontal maximumvariable; a horizontal minimum variable; a vertical maximum variable;and a vertical minimum variable.

Example 17. The apparatus of example 16, wherein the horizontal minimumvariable is defined by a maximum value selected from a horizontalminimum picture value and a horizontal minimum motion vector value.

Example 18. The apparatus of example 17, wherein the horizontal minimumpicture value is determined from the associated horizontal coordinate.

Example 19. The apparatus of example 18, wherein the horizontal minimummotion vector value is determined from a center motion vector value, anarray of values based on a resolution value associated with the videodata or a block area size (e.g., a current coding block width×height),and the width variable specifying the width of the current coding block.

Example 20. The apparatus of example 19, wherein the center motionvector value is determined from the base scaled motion vector, thehorizontal change of motion vector, the width variable, and the heightvariable.

Example 21. The apparatus of example 20, wherein the base scaled motionvector corresponds to a top left corner of the current coding block andis determined from control point motion vector values.

Example 22. The apparatus of examples 16-21 above, wherein thehorizontal maximum variable is defined by a minimum value selected froma horizontal maximum picture value and a horizontal maximum motionvector value.

Example 23. The apparatus of example 22, wherein the horizontal maximumpicture value is determined from the width of the picture, theassociated horizontal coordinate, and the width variable.

Example 24. The apparatus of example 23, wherein the horizontal maximummotion vector value is determined from a center motion vector value, anarray of values based on a resolution value associated with the videodata or a block area size (e.g., a current coding block width×height),and the width variable specifying the width of the current coding block.

Example 25. The apparatus of example 24, wherein the center motionvector value is determined from the base scaled motion vector, thehorizontal change of motion vector, the width variable, and the heightvariable.

Example 26. The apparatus of example 25, wherein the base scaled motionvector corresponds to a corner of the current coding block and isdetermined from control point motion vector values.

Example 27. The apparatus of examples 16-26 above, wherein the verticalmaximum variable is defined by a minimum value selected from a verticalmaximum picture value and a vertical maximum motion vector value.

Example 28. The apparatus of example 27, wherein the vertical maximumpicture value is determined from the height of the picture, theassociated vertical coordinate, and the height variable.

Example 29. The apparatus of example 28, wherein the vertical maximummotion vector value is determined from a center motion vector value, anarray of values based on a resolution value associated with the videodata or a block area size (e.g., a current coding block width×height),and the height variable specifying the width of the current codingblock.

Example 30. The apparatus of examples 16-30 above, wherein the verticalminimum variable is defined by a maximum value selected from a verticalminimum picture value and a vertical minimum motion vector value.

Example 31. The apparatus of example 30, wherein the vertical minimumpicture value is determined from the associated vertical coordinate.

Example 32. The apparatus of example 31, wherein the vertical minimummotion vector value is determined from a center motion vector value, anarray of values based on a resolution value associated with the videodata or a block area size (e.g., a current coding block width×height),and the height variable specifying the height of the current codingblock.

Example 33. The apparatus of examples 14-32, wherein the one or moreprocessors are configured to: sequentially obtain a plurality of currentcoding blocks from the video data; determine a set of affine motionvector clipping parameters on a per coding block basis for blocks of theplurality of current coding blocks; and fetch portions of acorresponding reference pictures using the set of affine motion vectorclipping parameters on the per block basis for the plurality of currentcoding blocks.

Example 34. The apparatus of examples 14-33, wherein the one or moreprocessors are configured to: identify a reference picture associatedwith the current coding block; and store a portion of the referencepicture defined by the one or more affine motion vector clippingparameters.

Example 35. The apparatus of example 34, further comprising a memorybuffer coupled to the one or more processors, wherein the portion of thereference picture is stored in the memory buffer for affine motionprocessing operations using the current coding block.

Example 36. The apparatus of examples 14-35, wherein the one or moreprocessors are configured to: process the current coding block usingreference picture data from a reference picture indicated by the clippedaffine motion vector.

Example 37. The apparatus of examples 14=36, wherein the affine motionvector for the sample of the current coding block is determinedaccording to a first base scaled motion vector value, a first horizontalchange of motion vector value, a first vertical change of motion vectorvalue, a second base scaled motion vector value, a second horizontalchange of motion vector value, a second vertical change of motion vectorvalue, a horizontal coordinate of the sample, and a vertical coordinateof the sample.

Example 38. The apparatus of examples 14-37, wherein the control datacomprises values from a derivation table.

Example 39. The apparatus of examples 14-38, wherein the current codingblock is a luma coding block.

Example 40. The apparatus of examples 14-39, further comprising: adisplay device coupled to the one or more processors and configured todisplay images from the video data; and one or more wireless interfacescoupled to the one or more processors, the one or more wirelessinterfaces comprising one or more baseband processors and one or moretransceivers.

Example 41. A method of coding video data, the method comprising:obtaining a current coding block from the video data; determiningcontrol data for the current coding block; determining one or moreaffine motion vector clipping parameters from the control data;selecting a sample of the current coding block; determining an affinemotion vector for the sample of the current coding block; and clippingthe affine motion vector using the one or more affine motion vectorclipping parameters to generate a clipped affine motion vector.

Example 42. The method of example 41 in accordance with any of examples14-40.

Example 43. A non-transitory computer readable medium comprisinginstructions that, when executed by one or more processors of a codingdevice, cause the device to perform video coding operations on a videodata in accordance with any of examples 14-40 above.

Example 44. An apparatus for coding video data, the apparatuscomprising: means for obtaining a current coding block from the videodata; means for determining control data for the current coding block;means for determining one or more affine motion vector clippingparameters from the control data; means for selecting a sample of thecurrent coding block; means for determining an affine motion vector forthe sample of the current coding block; and means for clipping theaffine motion vector using the one or more affine motion vector clippingparameters to generate a clipped affine motion vector.

Example 45. The apparatus for coding video data of example 44, inaccordance with any of examples 14-40 above.

Example 46. A non-transitory computer-readable storage medium comprisinginstructions stored thereon which, when executed by one or moreprocessors, cause the one or more processors to: obtain a current codingblock from video data; determine control data for the current codingblock; determine one or more affine motion vector clipping parametersfrom the control data; select a sample of the current coding block;determine an affine motion vector for the sample of the current codingblock; and clip the affine motion vector using the one or more affinemotion vector clipping parameters to generate a clipped affine motionvector.

Example 47. The non-transitory computer-readable medium of example 46,including instructions to cause the one or more processors to operate inaccordance with any of examples 14-40 above.

What is claimed is:
 1. An apparatus for coding video data, the apparatuscomprising: memory; and one or more processors coupled to the memory,the one or more processors being configured to: obtain a current codingblock from the video data; determine control data for the current codingblock, the control data comprising a location with a horizontalcoordinate and a vertical coordinate in full-sample units, a widthvariable specifying a width of the current coding block, a heightvariable specifying a height of the current coding block, a horizontalchange of a motion vector, a vertical change of the motion vector, and abase scaled motion vector; determine one or more affine motion vectorclipping parameters from the control data; select a sample of thecurrent coding block; determine an affine motion vector for the sampleof the current coding block; and clip the affine motion vector using theone or more affine motion vector clipping parameters to generate aclipped affine motion vector.
 2. The apparatus of claim 1, wherein thecontrol data further comprises: a height of a picture associated withthe current coding block in samples; and a width of the picture insamples.
 3. The apparatus of claim 2, wherein the one or more affinemotion vector clipping parameters comprise: a horizontal maximumvariable; a horizontal minimum variable; a vertical maximum variable;and a vertical minimum variable.
 4. The apparatus of claim 3, whereinthe horizontal minimum variable is defined by a maximum value selectedfrom a horizontal minimum picture value and a horizontal minimum motionvector value.
 5. The apparatus of claim 4, wherein the horizontalminimum picture value is determined from the associated horizontalcoordinate.
 6. The apparatus of claim 5, wherein the horizontal minimummotion vector value is determined from a center motion vector value, anarray of values based on a resolution value associated with the videodata, and the width variable specifying the width of the current codingblock.
 7. The apparatus of claim 6, wherein the center motion vectorvalue is determined from the base scaled motion vector, the horizontalchange of motion vector, the width variable, and the height variable. 8.The apparatus of claim 7, wherein the base scaled motion vectorcorresponds to a top left corner of the current coding block and isdetermined from control point motion vector values.
 9. The apparatus ofclaim 3, wherein the horizontal maximum variable is defined by a minimumvalue selected from a horizontal maximum picture value and a horizontalmaximum motion vector value.
 10. The apparatus of claim 9, wherein thehorizontal maximum picture value is determined from the width of thepicture, the associated horizontal coordinate, and the width variable.11. The apparatus of claim 10, wherein the horizontal maximum motionvector value is determined from a center motion vector value, an arrayof values based on a resolution value associated with the video data,and the width variable specifying the width of the current coding block.12. The apparatus of claim 11, wherein the center motion vector value isdetermined from the base scaled motion vector, the horizontal change ofmotion vector, the width variable, and the height variable.
 13. Theapparatus of claim 12, wherein the base scaled motion vector correspondsto a corner of the current coding block and is determined from controlpoint motion vector values.
 14. The apparatus of claim 3, wherein thevertical maximum variable is defined by a minimum value selected from avertical maximum picture value and a vertical maximum motion vectorvalue.
 15. The apparatus of claim 14, wherein the vertical maximumpicture value is determined from the height of the picture, theassociated vertical coordinate, and the height variable.
 16. Theapparatus of claim 15, wherein the vertical maximum motion vector valueis determined from a center motion vector value, an array of valuesbased on a block area size associated with the video data, and theheight variable specifying the width of the current coding block. 17.The apparatus of claim 3, wherein the vertical minimum variable isdefined by a maximum value selected from a vertical minimum picturevalue and a vertical minimum motion vector value.
 18. The apparatus ofclaim 17, wherein the vertical minimum picture value is determined fromthe associated vertical coordinate.
 19. The apparatus of claim 18,wherein the vertical minimum motion vector value is determined from acenter motion vector value, data block area size associated with thevideo data, and the height variable specifying the height of the currentcoding block.
 20. The apparatus of claim 1, wherein the one or moreprocessors are configured to: sequentially obtain a plurality of currentcoding blocks from the video data; determine a set of affine motionvector clipping parameters on a per coding block basis for blocks of theplurality of current coding blocks; and fetch portions of acorresponding reference pictures using the set of affine motion vectorclipping parameters on the per block basis for the plurality of currentcoding blocks.
 21. The apparatus of claim 1, wherein the one or moreprocessors are configured to: identify a reference picture associatedwith the current coding block; and store a portion of the referencepicture defined by the one or more affine motion vector clippingparameters.
 22. The apparatus of claim 21, further comprising a memorybuffer coupled to the one or more processors, wherein the portion of thereference picture is stored in the memory buffer for affine motionprocessing operations using the current coding block.
 23. The apparatusof claim 1, wherein the one or more processors are configured to:process the current coding block using reference picture data from areference picture indicated by the clipped affine motion vector.
 24. Theapparatus of claim 1, wherein the affine motion vector for the sample ofthe current coding block is determined according to a first base scaledmotion vector value, a first horizontal change of motion vector value, afirst vertical change of motion vector value, a second base scaledmotion vector value, a second horizontal change of motion vector value,a second vertical change of motion vector value, a horizontal coordinateof the sample, and a vertical coordinate of the sample.
 25. Theapparatus of claim 1, wherein the control data comprises values from aderivation table.
 26. The apparatus of claim 1, wherein the currentcoding block is a luma coding block.
 27. The apparatus of claim 1,further comprising: a display device coupled to the one or moreprocessors and configured to display images from the video data; and oneor more wireless interfaces coupled to the one or more processors, theone or more wireless interfaces comprising one or more basebandprocessors and one or more transceivers.
 28. A method of coding videodata, the method comprising: obtaining a current coding block from thevideo data; determining control data for the current coding block, thecontrol data comprising a location with a horizontal coordinate and avertical coordinate in full-sample units, a width variable specifying awidth of the current coding block, a height variable specifying a heightof the current coding block, a horizontal change of a motion vector, avertical change of the motion vector, and a base scaled motion vector;determining one or more affine motion vector clipping parameters fromthe control data; selecting a sample of the current coding block;determining an affine motion vector for the sample of the current codingblock; and clipping the affine motion vector using the one or moreaffine motion vector clipping parameters to generate a clipped affinemotion vector.
 29. The method of claim 28, wherein the one or moreaffine motion vector clipping parameters comprise: a horizontal maximumvariable; a horizontal minimum variable; a vertical maximum variable;and a vertical minimum variable.
 30. The method of claim 29, wherein thehorizontal minimum variable is defined by a maximum value selected froma horizontal minimum picture value and a horizontal minimum motionvector value.
 31. The method of claim 30, wherein the horizontal minimumpicture value is determined from the associated horizontal coordinate.32. The method of claim 31, wherein the horizontal minimum motion vectorvalue is determined from a center motion vector value, an array ofvalues based on a block area size, and the width variable specifying thewidth of the current coding block.
 33. The method of claim 32, whereinthe center motion vector value is determined from the base scaled motionvector, the horizontal change of motion vector, the width variable, andthe height variable.
 34. The method of claim 33, wherein the base scaledmotion vector corresponds to a top left corner of the current codingblock and is determined from control point motion vector values.
 35. Themethod of claim 29, wherein the horizontal maximum variable is definedby a minimum value selected from a horizontal maximum picture value anda horizontal maximum motion vector value.
 36. The method of claim 35,wherein the horizontal maximum picture value is determined from theassociated horizontal coordinate, and the width variable.
 37. The methodof claim 36, wherein the horizontal maximum motion vector value isdetermined from a center motion vector value, an array of values basedon a block area size associated with the video data, and the widthvariable specifying the width of the current coding block.
 38. Themethod of claim 37, wherein the center motion vector value is determinedfrom the base scaled motion vector, the horizontal change of motionvector, the width variable, and the height variable.
 39. The method ofclaim 38, wherein the base scaled motion vector corresponds to a cornerof the current coding block and is determined from control point motionvector values.
 40. The method of claim 29, wherein the vertical maximumvariable is defined by a minimum value selected from a vertical maximumpicture value and a vertical maximum motion vector value.
 41. The methodof claim 40, wherein the vertical maximum picture value is determinedfrom the associated vertical coordinate, and the height variable. 42.The method of claim 41, wherein the vertical maximum motion vector valueis determined from a center motion vector value, an array of valuesbased on a block area size associated with the video data, and theheight variable specifying the width of the current coding block. 43.The method of claim 29, wherein the vertical minimum variable is definedby a maximum value selected from a vertical minimum picture value and avertical minimum motion vector value.
 44. The method of claim 43,wherein the vertical minimum picture value is determined from theassociated vertical coordinate.
 45. The method of claim 44, wherein thevertical minimum motion vector value is determined from a center motionvector value, an array of values based on a block area size associatedwith the video data, and the height variable specifying the height ofthe current coding block.
 46. The method of claim 28, furthercomprising: sequentially obtaining a plurality of current coding blocksfrom the video data; determining a set of affine motion vector clippingparameters on a per coding block basis for blocks of the plurality ofcurrent coding blocks; and fetching portions of a correspondingreference pictures using the set of affine motion vector clippingparameters on the per block basis for the plurality of current codingblocks.
 47. The apparatus of claim 28, further comprising: identifying areference picture associated with the current coding block; and storinga portion of the reference picture defined by the one or more affinemotion vector clipping parameters.
 48. The method of claim 47, whereinthe portion of the reference picture is stored in a memory buffer foraffine motion processing operations using the current coding block. 49.The method of claim 28, further comprising: processing the currentcoding block using reference picture data from a reference pictureindicated by the clipped affine motion vector.
 50. The method of claim28, wherein the affine motion vector for the sample of the currentcoding block is determined according to a first base scaled motionvector value, a first horizontal change of motion vector value, a firstvertical change of motion vector value, a second base scaled motionvector value, a second horizontal change of motion vector value, asecond vertical change of motion vector value, a horizontal coordinateof the sample, and a vertical coordinate of the sample.
 51. The methodof claim 28, wherein the control data comprises values from a derivationtable.
 52. The method of claim 28, wherein the current coding block is aluma coding block.
 53. A non-transitory computer-readable storage mediumcomprising instructions stored thereon which, when executed by one ormore processors, cause the one or more processors to: obtain a currentcoding block from video data; determine control data for the currentcoding block, the control data comprising a location with a horizontalcoordinate and a vertical coordinate in full-sample units, a widthvariable specifying a width of the current coding block, a heightvariable specifying a height of the current coding block, a horizontalchange of a motion vector, a vertical change of the motion vector, and abase scaled motion vector; determine one or more affine motion vectorclipping parameters from the control data; select a sample of thecurrent coding block; determine an affine motion vector for the sampleof the current coding block; and clip the affine motion vector using theone or more affine motion vector clipping parameters to generate aclipped affine motion vector.
 54. The method of claim 28, wherein thecontrol data further comprises: a height of a picture associated withthe current coding block in samples; and a width of the picture insamples.